在Ambari上用rest提交Spark到Yarn上

来源:互联网 发布:windows怎么看配置 编辑:程序博客网 时间:2024/05/16 12:14

系统

  • JDK版本:jdk1.8.0_66
  • HDP版本:2.4.2.0-258
  • Hadoop 版本:Hadoop 2.7.1.2.4.2.0-258
  • Spark 版本:1.6.0.2.4

前期准备

  1. /usr/hdp/2.4.2.0-258/spark/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar 上传到hdfs
  2. /usr/hdp/2.4.2.0-258/spark/conf/spark-default.conf 上传到HDFS并重命名为spark-yarn.properties
  3. 将任务jar包上传到hdfs

向YARN申请applicationId

# curl -v -X POST 'http://10.2.45.231:8088/ws/v1/cluster/apps/new-application'

得到applicationid为 application_1472797340021_0302

获取hdfs上某个文件的大小,时间戳

# hadoop fs -stat '%b  %Y' /demo.txt356  1473843721218

编写json 文件

{    "am-container-spec":{        "commands":{            "command":" /opt/jdk1.8.0_66/bin/java -Xmx1024m org.apache.spark.deploy.yarn.ApplicationMaster --jar __app__.jar --class com.zdhuang.WordCount --args hdfs://pdmiCluster/demo.txt --args hdfs://pdmiCluster/output "        },        "environment":{            "entry":[                {                    "key":"SPARK_YARN_MODE",                    "value":true                },                {                    "key":"SPARK_YARN_STAGING_DIR",                    "value":" "                },                {                    "key":"HDP_VERSION",                    "value":"2.4.2.0-258"                },                {                    "key":"CLASSPATH",                    "value":"__spark__.jar<CPS>__app__.jar<CPS>__app__.properties<CPS>/usr/hdp/2.4.2.0-258/spark/conf<CPS>/usr/hdp/2.4.2.0-258/spark/lib/*<CPS>/usr/hdp/2.4.2.0-258/hadoop/conf<CPS>/usr/hdp/2.4.2.0-258/hadoop/lib/*<CPS>/usr/hdp/2.4.2.0-258/hadoop/.//*<CPS>/usr/hdp/2.4.2.0-258/hadoop-hdfs/./<CPS>/usr/hdp/2.4.2.0-258/hadoop-hdfs/lib/*<CPS>/usr/hdp/2.4.2.0-258/hadoop-hdfs/.//*<CPS>/usr/hdp/2.4.2.0-258/hadoop-yarn/lib/*<CPS>/usr/hdp/2.4.2.0-258/hadoop-yarn/.//*<CPS>/usr/hdp/2.4.2.0-258/hadoop-mapreduce/lib/*<CPS>/usr/hdp/2.4.2.0-258/hadoop-mapreduce/.//*<CPS>/usr/share/java/mysql-connector-java-5.1.17.jar<CPS>/usr/share/java/mysql-connector-java.jar<CPS>/usr/hdp/current/hadoop-mapreduce-client/*"               },               {                      "key":"SPARK_YARN_CACHE_FILES",                    "value":"hdfs://pdmiCluster/Test/Spark/wc_v1.00.jar#__app__.jar,hdfs://pdmiCluster/Test/Spark/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar#__spark__.jar"                },                {                    "key":"SPARK_YARN_CACHE_FILES_FILE_SIZES",                    "value":"16008,185971201"                },                {                    "key":"SPARK_YARN_CACHE_FILES_TIME_STAMPS",                    "value":"1473924988796,1474440112743"                },                {                    "key":"SPARK_YARN_CACHE_FILES_VISIBILITIES",                    "value":"PUBLIC,PRIVATE"                }            ]        },        "local-resources":{            "entry":[                {                    "key":"__app__.jar",                    "value":{                        "resource":"hdfs://pdmiCluster/Test/Spark/wc_v1.00.jar",                        "size":16008,                        "timestamp":1473924988796,                        "type":"FILE",                        "visibility":"APPLICATION"                    }                },                {                    "key":"__spark__.jar",                    "value":{                        "resource":"hdfs://pdmiCluster/Test/Spark/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar",                        "size":185971201,                        "timestamp":1474440112743,                        "type":"FILE",                        "visibility":"APPLICATION"                    }                },                {                    "key":"__app__.properties",                    "value":{                        "resource":"hdfs://pdmiCluster/Test/Spark/spark-yarn.properties",                        "size":963,                        "timestamp":1474533638137,                        "type":"FILE",                        "visibility":"APPLICATION"                    }                }            ]        }    },    "application-id":"application_1472797340021_0302",    "application-name":"appsjc",    "application-type":"SPARK",    "queue":"test",    "priority":3,    "keep-containers-across-application-attempts":false,    "max-app-attempts":2,    "resource":{        "memory":1024,        "vCores":1    },    "unmanaged-AM":false}

解释json内容:

1. command 运行命令

"command":" /opt/jdk1.8.0_66/bin/java -Xmx1024m org.apache.spark.deploy.yarn.ApplicationMaster --jar __app__.jar --class com.zdhuang.WordCount --args hdfs://pdmiCluster/demo.txt --args hdfs://pdmiCluster/output "

java 虚拟内存 启动jar包的spark入口类 任务jar包别名 主类名 输入文件 输出目录

请根据自身环境修改相关参数(非加粗部分)

pdmiCluster 为dfs.nameservices

2. 修改HDP版本

"key":"HDP_VERSION","value":"2.4.2.0-258"classpath:

3. SPARK_YARN_CACHE_FILES

执行jar包在hdfs的全路径#别名,spark-assembly 包在hdfs的全路径#别名

4. SPARK_YARN_CACHE_FILES_FILE_SIZES

执行jar包和spark-assembly 包在hdfs上的大小

5. SPARK_YARN_CACHE_FILES_TIME_STAMPS

执行jar包和spark-assembly 包在hdfs上的时间戳

6. local-resources下的entry中的value

  • resource 在hdfs的全路径
  • size 在hdfs上的大小
  • timestamp 在hdfs上的时间戳

7. 其他

  • application-id :向YARN上申请的applicationid
  • application-name :application的名称
  • application-type :类型 spark
  • queue :队列名
  • priority :优先级

提交任务

#curl -s -i -X POST -H 'Accept: application/json' -H 'Content-Type: application/json' http://10.2.45.231:8088/ws/v1/cluster/apps --data-binary @spark-yarn.json

查看结果

在yarn上查看结果

在hdfs上查看结果

# hdfs dfs -ls /outputFound 3 items-rw-r--r--   3 yarn hdfs          0 2016-09-22 17:59 /output/_SUCCESS-rw-r--r--   3 yarn hdfs        237 2016-09-22 17:59 /output/part-00000-rw-r--r--   3 yarn hdfs        254 2016-09-22 17:59 /output/part-00001
# hdfs dfs -cat /output/part-00000(allows,1)(resource,1)(is,3)(Uniform,1)(file,,2)(address,1)(anything,1)(audio,1)(When,1)(some,1)(get,1)(locate,1)(been,1)(what's,1)(thing,1)(resource.,2)(with,,1)(just,1)(what,1)(accessed;,1)(Resource,1)(Locator,,1)
0 0
原创粉丝点击