基因数据处理83之移动GRCH38Index到每个节点

来源:互联网 发布:网络管理员好考吗 知乎 编辑:程序博客网 时间:2024/06/06 07:31

1.从cloud/adam移出到xubo/ref:

hadoop@Master:~/cloud/adam/xubo/data/test20160310$ mkdir -p ~/xubo/ref/GRCH38Index/hadoop@Master:~/cloud/adam/xubo/data/test20160310$ mv GCA_000001405.15_GRCh38/* ~/xubo/ref/GRCH38Index/hadoop@Master:~/cloud/adam/xubo/data/test20160310$ cd ~/xubo/ref/GRCH38Index/hadoop@Master:~/xubo/ref/GRCH38Index$ lscreateFastqBywgsim.sh   GCA_000001405.15_GRCh38_full_analysis_set.fna      GCA_000001405.15_GRCh38_full_analysis_set.fna.ann  GCA_000001405.15_GRCh38_full_analysis_set.fna.paccreateFastqBywgsim.txt  GCA_000001405.15_GRCh38_full_analysis_set.fna.alt  GCA_000001405.15_GRCh38_full_analysis_set.fna.bwt  GCA_000001405.15_GRCh38_full_analysis_set.fna.safastq                   GCA_000001405.15_GRCh38_full_analysis_set.fna.amb  GCA_000001405.15_GRCh38_full_analysis_set.fna.fai

2.每个节点创建目录:

mkdir -p ~/xubo/ref/ssh Mcnode2mkdir -p ~/xubo/ref/ssh Mcnode3mkdir -p ~/xubo/ref/ssh Mcnode4mkdir -p ~/xubo/ref/ssh Mcnode5mkdir -p ~/xubo/ref/ssh Mcnode6mkdir -p ~/xubo/ref/

3.分发index到每个节点:

hadoop@Master:~/xubo/ref$ dispatch.sh GRCH38Index/

比较耗时。

hadoop@Master:~/xubo/ref$ dispatch.sh GRCH38Index/GCA_000001405.15_GRCh38_full_analysis_set.fna.ann                                                                                                            100%   72KB  71.7KB/s   00:00    createFastqBywgsim.sh                                                                                                                                        100%  541     0.5KB/s   00:00    GCA_000001405.15_GRCh38_full_analysis_set.fna.pac                                                                                                            100%  765MB  10.8MB/s   01:11    GCA_000001405.15_GRCh38_full_analysis_set.fna.sa                                                                                                             100% 1530MB  10.5MB/s   02:26    GCA_000001405.15_GRCh38_full_analysis_set.fna                                                                                                                100% 3105MB  10.7MB/s   04:50    GCA_000001405.15_GRCh38_full_analysis_set.fna.amb                                                                                                            100%   20KB  19.7KB/s   00:00    GCA_000001405.15_GRCh38_full_analysis_set.fna.alt                                                                                                            100%  214KB 214.2KB/s   00:00    GCA_000001405.15_GRCh38_full_analysis_set.fna.fai                                                                                                            100%   19KB  19.0KB/s   00:00    GCA_000001405.15_GRCh38_full_analysis_set.fna.bwt                                                                                                            100% 3061MB  10.6MB/s   04:49    GCA_000001405.15_GRCh38_full_analysis_set.fna.ann                                                                                                            100%   72KB  71.7KB/s   00:00    createFastqBywgsim.sh                                                                                                                                        100%  541     0.5KB/s   00:00    GCA_000001405.15_GRCh38_full_analysis_set.fna.pac                                                                                                            100%  765MB  10.5MB/s   01:13    GCA_000001405.15_GRCh38_full_analysis_set.fna.sa                                                                                                             100% 1530MB  10.7MB/s   02:23    GCA_000001405.15_GRCh38_full_analysis_set.fna                                                                                                                100% 3105MB  10.7MB/s   04:50    GCA_000001405.15_GRCh38_full_analysis_set.fna.amb                                                                                                            100%   20KB  19.7KB/s   00:00    GCA_000001405.15_GRCh38_full_analysis_set.fna.alt                                                                                                            100%  214KB 214.2KB/s   00:00    GCA_000001405.15_GRCh38_full_analysis_set.fna.fai                                                                                                            100%   19KB  19.0KB/s   00:00    GCA_000001405.15_GRCh38_full_analysis_set.fna.bwt                                                                                                            100% 3061MB  10.3MB/s   04:57    GCA_000001405.15_GRCh38_full_analysis_set.fna.ann                                                                                                            100%   72KB  71.7KB/s   00:00    createFastqBywgsim.sh                                                                                                                                        100%  541     0.5KB/s   00:00    GCA_000001405.15_GRCh38_full_analysis_set.fna.pac                                                                                                            100%  765MB  10.9MB/s   01:10    GCA_000001405.15_GRCh38_full_analysis_set.fna.sa                                                                                                             100% 1530MB   8.3MB/s   03:04    GCA_000001405.15_GRCh38_full_analysis_set.fna                                                                                                                100% 3105MB   9.9MB/s   05:13    GCA_000001405.15_GRCh38_full_analysis_set.fna.amb                                                                                                            100%   20KB  19.7KB/s   00:00    GCA_000001405.15_GRCh38_full_analysis_set.fna.alt                                                                                                            100%  214KB 214.2KB/s   00:00    GCA_000001405.15_GRCh38_full_analysis_set.fna.fai                                                                                                            100%   19KB  19.0KB/s   00:00    GCA_000001405.15_GRCh38_full_analysis_set.fna.bwt                                                                                                            100% 3061MB  10.3MB/s   04:58    GCA_000001405.15_GRCh38_full_analysis_set.fna.ann                                                                                                            100%   72KB  71.7KB/s   00:00    createFastqBywgsim.sh                                                                                                                                        100%  541     0.5KB/s   00:00    GCA_000001405.15_GRCh38_full_analysis_set.fna.pac                                                                                                            100%  765MB  10.9MB/s   01:10    GCA_000001405.15_GRCh38_full_analysis_set.fna.sa                                                                                                             100% 1530MB  10.1MB/s   02:32    GCA_000001405.15_GRCh38_full_analysis_set.fna                                                                                                                100% 3105MB   9.7MB/s   05:20    GCA_000001405.15_GRCh38_full_analysis_set.fna.amb                                                                                                            100%   20KB  19.7KB/s   00:00    GCA_000001405.15_GRCh38_full_analysis_set.fna.alt                                                                                                            100%  214KB 214.2KB/s   00:00    GCA_000001405.15_GRCh38_full_analysis_set.fna.fai                                                                                                            100%   19KB  19.0KB/s   00:00    GCA_000001405.15_GRCh38_full_analysis_set.fna.bwt                                                                                                            100% 3061MB  10.4MB/s   04:54    GCA_000001405.15_GRCh38_full_analysis_set.fna.ann                                                                                                            100%   72KB  71.7KB/s   00:00    createFastqBywgsim.sh                                                                                                                                        100%  541     0.5KB/s   00:00    GCA_000001405.15_GRCh38_full_analysis_set.fna.pac                                                                                                            100%  765MB  10.8MB/s   01:11    GCA_000001405.15_GRCh38_full_analysis_set.fna.sa                                                                                                             100% 1530MB  10.8MB/s   02:22    GCA_000001405.15_GRCh38_full_analysis_set.fna                                                                                                                100% 3105MB  10.0MB/s   05:11    GCA_000001405.15_GRCh38_full_analysis_set.fna.amb                                                                                                            100%   20KB  19.7KB/s   00:00    GCA_000001405.15_GRCh38_full_analysis_set.fna.alt                                                                                                            100%  214KB 214.2KB/s   00:00    GCA_000001405.15_GRCh38_full_analysis_set.fna.fai                                                                                                            100%   19KB  19.0KB/s   00:00    GCA_000001405.15_GRCh38_full_analysis_set.fna.bwt                                                                                                            100% 3061MB  10.9MB/s   04:42    GCA_000001405.15_GRCh38_full_analysis_set.fna.ann                                                                                                            100%   72KB  71.7KB/s   00:00    createFastqBywgsim.sh                                                                                                                                        100%  541     0.5KB/s   00:00    GCA_000001405.15_GRCh38_full_analysis_set.fna.pac                                                                                                            100%  765MB  10.6MB/s   01:12    GCA_000001405.15_GRCh38_full_analysis_set.fna.sa                                                                                                             100% 1530MB  10.4MB/s   02:27    GCA_000001405.15_GRCh38_full_analysis_set.fna                                                                                                                100% 3105MB   9.8MB/s   05:17    GCA_000001405.15_GRCh38_full_analysis_set.fna.amb                                                                                                            100%   20KB  19.7KB/s   00:00    GCA_000001405.15_GRCh38_full_analysis_set.fna.alt                                                                                                            100%  214KB 214.2KB/s   00:00    GCA_000001405.15_GRCh38_full_analysis_set.fna.fai                                                                                                            100%   19KB  19.0KB/s   00:00    GCA_000001405.15_GRCh38_full_analysis_set.fna.bwt                                                                                                            100% 3061MB   9.7MB/s   05:15    hadoop@Master:~/xubo/ref$ mv GCA_000001405.15_GRCh38/* ~/xubo/ref/GRCH38Index/

参考

【1】https://github.com/xubo245/AdamLearning【2】https://github.com/bigdatagenomics/adam/ 【3】https://github.com/xubo245/SparkLearning【4】http://spark.apache.org【5】http://stackoverflow.com/questions/28166667/how-to-pass-d-parameter-or-environment-variable-to-spark-job  【6】http://stackoverflow.com/questions/28840438/how-to-override-sparks-log4j-properties-per-driver

研究成果:

【1】 [BIBM] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Chao Wang, and Xuehai Zhou, "Distributed Gene Clinical Decision Support System Based on Cloud Computing", in IEEE International Conference on Bioinformatics and Biomedicine. (BIBM 2017, CCF B)【2】 [IEEE CLOUD] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Xuehai Zhou. Efficient Distributed Smith-Waterman Algorithm Based on Apache Spark (CLOUD 2017, CCF-C).【3】 [CCGrid] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Jinhong Zhou, Xuehai Zhou. DSA: Scalable Distributed Sequence Alignment System Using SIMD Instructions. (CCGrid 2017, CCF-C).【4】more: https://github.com/xubo245/Publications

Help

If you have any questions or suggestions, please write it in the issue of this project or send an e-mail to me: xubo245@mail.ustc.edu.cnWechat: xu601450868QQ: 601450868
阅读全文
'); })();
0 0
原创粉丝点击
热门IT博客
热门问题 老师的惩罚 人脸识别 我在镇武司摸鱼那些年 重生之率土为王 我在大康的咸鱼生活 盘龙之生命进化 天生仙种 凡人之先天五行 春回大明朝 姑娘不必设防,我是瞎子 他的小仙女格格党 位面小书店系统格格党 小温柔格格党 锦鲤小皇后格格党 银河尽头的小饭馆格格党 小日子何书格格党 裹紧我的小马甲格格党 宗亲家的小娘子格格党 现代小城隍格格党 七零暖宠小知青格格党 适合学生党的鞋子小众品牌 七零年代小温馨格格党 穿成霸总小逃妻格格党 小妖精故筝格格党 80年代小夫妻日常格格党 小妖精快穿故筝格格党 地球人的小商铺格格党 从小开始当网红学霸格格党 宠妾简小酌格格党 见鬼曲小蛐格格党 超英的小团子格格党 大农场主的悠哉小日子格格党 山海无境格格党 党山卫浴 萧山党山卫浴 喊山 山, ,山 原山 首山 满山 山得 关门山 阵山 山恶 还山 山不错 营山 见山是山 山由 汤山