在Ambari上用rest提交Spark到Yarn上
来源:互联网 发布:windows怎么看配置 编辑:程序博客网 时间:2024/05/16 12:14
系统
- JDK版本:jdk1.8.0_66
- HDP版本:2.4.2.0-258
- Hadoop 版本:Hadoop 2.7.1.2.4.2.0-258
- Spark 版本:1.6.0.2.4
前期准备
- /usr/hdp/2.4.2.0-258/spark/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar 上传到hdfs
- /usr/hdp/2.4.2.0-258/spark/conf/spark-default.conf 上传到HDFS并重命名为spark-yarn.properties
- 将任务jar包上传到hdfs
向YARN申请applicationId
# curl -v -X POST 'http://10.2.45.231:8088/ws/v1/cluster/apps/new-application'
得到applicationid为 application_1472797340021_0302
获取hdfs上某个文件的大小,时间戳
# hadoop fs -stat '%b %Y' /demo.txt356 1473843721218
编写json 文件
{ "am-container-spec":{ "commands":{ "command":" /opt/jdk1.8.0_66/bin/java -Xmx1024m org.apache.spark.deploy.yarn.ApplicationMaster --jar __app__.jar --class com.zdhuang.WordCount --args hdfs://pdmiCluster/demo.txt --args hdfs://pdmiCluster/output " }, "environment":{ "entry":[ { "key":"SPARK_YARN_MODE", "value":true }, { "key":"SPARK_YARN_STAGING_DIR", "value":" " }, { "key":"HDP_VERSION", "value":"2.4.2.0-258" }, { "key":"CLASSPATH", "value":"__spark__.jar<CPS>__app__.jar<CPS>__app__.properties<CPS>/usr/hdp/2.4.2.0-258/spark/conf<CPS>/usr/hdp/2.4.2.0-258/spark/lib/*<CPS>/usr/hdp/2.4.2.0-258/hadoop/conf<CPS>/usr/hdp/2.4.2.0-258/hadoop/lib/*<CPS>/usr/hdp/2.4.2.0-258/hadoop/.//*<CPS>/usr/hdp/2.4.2.0-258/hadoop-hdfs/./<CPS>/usr/hdp/2.4.2.0-258/hadoop-hdfs/lib/*<CPS>/usr/hdp/2.4.2.0-258/hadoop-hdfs/.//*<CPS>/usr/hdp/2.4.2.0-258/hadoop-yarn/lib/*<CPS>/usr/hdp/2.4.2.0-258/hadoop-yarn/.//*<CPS>/usr/hdp/2.4.2.0-258/hadoop-mapreduce/lib/*<CPS>/usr/hdp/2.4.2.0-258/hadoop-mapreduce/.//*<CPS>/usr/share/java/mysql-connector-java-5.1.17.jar<CPS>/usr/share/java/mysql-connector-java.jar<CPS>/usr/hdp/current/hadoop-mapreduce-client/*" }, { "key":"SPARK_YARN_CACHE_FILES", "value":"hdfs://pdmiCluster/Test/Spark/wc_v1.00.jar#__app__.jar,hdfs://pdmiCluster/Test/Spark/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar#__spark__.jar" }, { "key":"SPARK_YARN_CACHE_FILES_FILE_SIZES", "value":"16008,185971201" }, { "key":"SPARK_YARN_CACHE_FILES_TIME_STAMPS", "value":"1473924988796,1474440112743" }, { "key":"SPARK_YARN_CACHE_FILES_VISIBILITIES", "value":"PUBLIC,PRIVATE" } ] }, "local-resources":{ "entry":[ { "key":"__app__.jar", "value":{ "resource":"hdfs://pdmiCluster/Test/Spark/wc_v1.00.jar", "size":16008, "timestamp":1473924988796, "type":"FILE", "visibility":"APPLICATION" } }, { "key":"__spark__.jar", "value":{ "resource":"hdfs://pdmiCluster/Test/Spark/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar", "size":185971201, "timestamp":1474440112743, "type":"FILE", "visibility":"APPLICATION" } }, { "key":"__app__.properties", "value":{ "resource":"hdfs://pdmiCluster/Test/Spark/spark-yarn.properties", "size":963, "timestamp":1474533638137, "type":"FILE", "visibility":"APPLICATION" } } ] } }, "application-id":"application_1472797340021_0302", "application-name":"appsjc", "application-type":"SPARK", "queue":"test", "priority":3, "keep-containers-across-application-attempts":false, "max-app-attempts":2, "resource":{ "memory":1024, "vCores":1 }, "unmanaged-AM":false}
解释json内容:
1. command 运行命令
"command":" /opt/jdk1.8.0_66/bin/java -Xmx1024m org.apache.spark.deploy.yarn.ApplicationMaster --jar __app__.jar --class com.zdhuang.WordCount --args hdfs://pdmiCluster/demo.txt --args hdfs://pdmiCluster/output "
java 虚拟内存 启动jar包的spark入口类 任务jar包别名 主类名 输入文件 输出目录
请根据自身环境修改相关参数(非加粗部分)
pdmiCluster 为dfs.nameservices
2. 修改HDP版本
"key":"HDP_VERSION","value":"2.4.2.0-258"classpath:
3. SPARK_YARN_CACHE_FILES
执行jar包在hdfs的全路径#别名,spark-assembly 包在hdfs的全路径#别名
4. SPARK_YARN_CACHE_FILES_FILE_SIZES
执行jar包和spark-assembly 包在hdfs上的大小
5. SPARK_YARN_CACHE_FILES_TIME_STAMPS
执行jar包和spark-assembly 包在hdfs上的时间戳
6. local-resources下的entry中的value
- resource 在hdfs的全路径
- size 在hdfs上的大小
- timestamp 在hdfs上的时间戳
7. 其他
- application-id :向YARN上申请的applicationid
- application-name :application的名称
- application-type :类型 spark
- queue :队列名
- priority :优先级
提交任务
#curl -s -i -X POST -H 'Accept: application/json' -H 'Content-Type: application/json' http://10.2.45.231:8088/ws/v1/cluster/apps --data-binary @spark-yarn.json
查看结果
在yarn上查看结果
在hdfs上查看结果
# hdfs dfs -ls /outputFound 3 items-rw-r--r-- 3 yarn hdfs 0 2016-09-22 17:59 /output/_SUCCESS-rw-r--r-- 3 yarn hdfs 237 2016-09-22 17:59 /output/part-00000-rw-r--r-- 3 yarn hdfs 254 2016-09-22 17:59 /output/part-00001
# hdfs dfs -cat /output/part-00000(allows,1)(resource,1)(is,3)(Uniform,1)(file,,2)(address,1)(anything,1)(audio,1)(When,1)(some,1)(get,1)(locate,1)(been,1)(what's,1)(thing,1)(resource.,2)(with,,1)(just,1)(what,1)(accessed;,1)(Resource,1)(Locator,,1)
0 0
- 在Ambari上用rest提交Spark到Yarn上
- 用maven管理spark应用程序,提交到spark on yarn 集群上运行
- 在 YARN 上运行 Spark
- spark部署:在YARN上运行Spark
- 将Spark部署到Hadoop YARN上
- 在 Yarn 上 安装 Spark 0.9.0
- Spark在Yarn上运行Wordcount程序
- 在Yarn上运行Apache Zeppelin & Spark
- spark用程序提交任务到yarn
- 在windows上使用eclipse提交Spark任务到Spark平台上
- 在windows上使用eclipse提交Spark任务到Spark平台上
- 在Yarn上运行spark-shell和spark-sql命令行
- 在Yarn上运行spark-shell和spark-sql命令行
- spark安装:在hadoop YARN上运行spark-shell
- 在window上提交spark代码到远程测试环境上运行
- spark-submit到yarn上遇到的各种坑
- 在基于Yarn的集群上运行Spark程序
- Spark Executor在YARN上的内存分配
- 第四周项目4 建立双链表算法库
- BZOJ网络流+费用流:【1221[HNOI2001] 软件开发】
- Mina和openfire
- Android开发:Handler异步通信机制全面解析(包含Looper、Message Queue)
- Switch语句注意事项
- 在Ambari上用rest提交Spark到Yarn上
- 操作系统常见面试题总结
- 初体验uglifyjs压缩JS的
- 流媒体学习之一些应该再了解的名词
- python 解决中文字符串报错问题
- 利用Retrofit进行网络访问
- Sass教程
- ipmitool命令
- 从问题单处理了解ActivityManagerService