hadoop历史服务器

来源:互联网 发布:淘宝双11狂欢夜晚会 编辑:程序博客网 时间:2024/05/17 01:38

可以通过历史服务器查看已经运行完的Mapreduce作业记录,比如用了多少个Map、用了多少个Reduce、作业提交时间、作业启动时间、作业完成时间等信息。

默认情况下,hadoop历史服务器是没有启动的,我们可以通过下面的命令来启动hadoop历史服务器

$ sbin/mr-jobhistory-daemon.sh start historyserver

在相应机器的19888端口上就可以打开历史服务器的WEB UI界面。

历史服务器可以单独在一台机器上启动,主要是通过以下的参数配置:
<property>
    <name>mapreduce.jobhistory.address</name>
    <value>0.0.0.0:10020</value>
</property>
参数解释:MapReduce JobHistory Server地址。

<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>0.0.0.0:19888</value>
</property>
参数解释:MapReduce JobHistory Server Web UI地址。

在mapred-site.xml文件中进行配置,配置完上述的参数之后,重新启动Hadoop jobhistory,这样我们就可以在mapreduce.jobhistory.webapp.address参数配置的机器上对Hadoop历史作业情况进行查看。 

这些历史数据存放在HDFS中,可以通过下面的配置来设置在HDFS的什么目录下存放历史作业记录:

<property>
    <name>mapreduce.jobhistory.done-dir</name>
    <value>${yarn.app.mapreduce.am.staging-dir}/history/done</value>
</property>

<property>
    <name>mapreduce.jobhistory.intermediate-done-dir</name>
    <value>${yarn.app.mapreduce.am.staging-dir}/history/done_intermediate</value>
</property>

<property>
    <name>yarn.app.mapreduce.am.staging-dir</name>
    <value>/tmp/hadoop-yarn/staging</value>
</property>

上面的配置都是默认的值,在mapred-site.xml文件中进行修改。

mapreduce.jobhistory.done-dir:在什么目录下存放已经运行完的Hadoop作业记录;

mapreduce.jobhistory.intermediate-done-dir:正在运行的Hadoop作业记录。


查看运行完的Hadoop作业:

[sparkadmin@spark5 ~]$ hadoop fs -ls /tmp/hadoop-yarn/staging/history/done/
Found 3 items
drwxrwxrwx   - sparkadmin supergroup          0 2015-12-01 00:02 /tmp/hadoop-yarn/staging/history/done/2015
drwxrwx---   - sparkadmin supergroup          0 2016-12-01 00:07 /tmp/hadoop-yarn/staging/history/done/2016
drwxrwx---   - sparkadmin supergroup          0 2017-01-01 00:07 /tmp/hadoop-yarn/staging/history/done/2017


[sparkadmin@spark5 ~]$ hadoop fs -ls /tmp/hadoop-yarn/staging/history/done/2016
Found 12 items
drwxrwx---   - sparkadmin supergroup          0 2016-01-31 00:02 /tmp/hadoop-yarn/staging/history/done/2016/01
drwxrwx---   - sparkadmin supergroup          0 2016-02-29 00:02 /tmp/hadoop-yarn/staging/history/done/2016/02
drwxrwx---   - sparkadmin supergroup          0 2016-03-31 00:03 /tmp/hadoop-yarn/staging/history/done/2016/03
drwxrwx---   - sparkadmin supergroup          0 2016-04-30 00:02 /tmp/hadoop-yarn/staging/history/done/2016/04
drwxrwx---   - sparkadmin supergroup          0 2016-05-31 00:02 /tmp/hadoop-yarn/staging/history/done/2016/05
drwxrwx---   - sparkadmin supergroup          0 2016-06-30 00:02 /tmp/hadoop-yarn/staging/history/done/2016/06
drwxrwx---   - sparkadmin supergroup          0 2016-07-31 00:00 /tmp/hadoop-yarn/staging/history/done/2016/07
drwxrwx---   - sparkadmin supergroup          0 2016-08-31 00:00 /tmp/hadoop-yarn/staging/history/done/2016/08
drwxrwx---   - sparkadmin supergroup          0 2016-09-30 00:00 /tmp/hadoop-yarn/staging/history/done/2016/09
drwxrwx---   - sparkadmin supergroup          0 2016-10-31 00:06 /tmp/hadoop-yarn/staging/history/done/2016/10
drwxrwx---   - sparkadmin supergroup          0 2016-11-30 00:07 /tmp/hadoop-yarn/staging/history/done/2016/11
drwxrwx---   - sparkadmin supergroup          0 2016-12-31 00:07 /tmp/hadoop-yarn/staging/history/done/2016/12


[sparkadmin@spark5 ~]$ hadoop fs -ls /tmp/hadoop-yarn/staging/history/done/2016/12
17/01/09 10:05:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 31 items
drwxrwx---   - sparkadmin supergroup          0 2016-12-09 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/01
drwxrwx---   - sparkadmin supergroup          0 2016-12-10 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/02
drwxrwx---   - sparkadmin supergroup          0 2016-12-10 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/03
drwxrwx---   - sparkadmin supergroup          0 2016-12-11 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/04
drwxrwx---   - sparkadmin supergroup          0 2016-12-13 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/05
drwxrwx---   - sparkadmin supergroup          0 2016-12-13 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/06
drwxrwx---   - sparkadmin supergroup          0 2016-12-15 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/07
drwxrwx---   - sparkadmin supergroup          0 2016-12-16 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/08
drwxrwx---   - sparkadmin supergroup          0 2016-12-17 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/09
drwxrwx---   - sparkadmin supergroup          0 2016-12-18 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/10
drwxrwx---   - sparkadmin supergroup          0 2016-12-19 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/11
drwxrwx---   - sparkadmin supergroup          0 2016-12-20 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/12
drwxrwx---   - sparkadmin supergroup          0 2016-12-21 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/13
drwxrwx---   - sparkadmin supergroup          0 2016-12-22 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/14
drwxrwx---   - sparkadmin supergroup          0 2016-12-23 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/15
drwxrwx---   - sparkadmin supergroup          0 2016-12-24 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/16
drwxrwx---   - sparkadmin supergroup          0 2016-12-25 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/17
drwxrwx---   - sparkadmin supergroup          0 2016-12-26 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/18
drwxrwx---   - sparkadmin supergroup          0 2016-12-27 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/19
drwxrwx---   - sparkadmin supergroup          0 2016-12-28 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/20
drwxrwx---   - sparkadmin supergroup          0 2016-12-29 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/21
drwxrwx---   - sparkadmin supergroup          0 2016-12-30 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/22
drwxrwx---   - sparkadmin supergroup          0 2016-12-31 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/23
drwxrwx---   - sparkadmin supergroup          0 2017-01-01 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/24
drwxrwx---   - sparkadmin supergroup          0 2017-01-02 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/25
drwxrwx---   - sparkadmin supergroup          0 2017-01-03 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/26
drwxrwx---   - sparkadmin supergroup          0 2017-01-04 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/27
drwxrwx---   - sparkadmin supergroup          0 2017-01-05 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/28
drwxrwx---   - sparkadmin supergroup          0 2017-01-06 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/29
drwxrwx---   - sparkadmin supergroup          0 2017-01-07 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/30
drwxrwx---   - sparkadmin supergroup          0 2017-01-08 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/31


[sparkadmin@spark5 ~]$ hadoop fs -ls /tmp/hadoop-yarn/staging/history/done/2017/01/09
Found 2 items
drwxrwx---   - sparkadmin supergroup          0 2017-01-09 01:04 /tmp/hadoop-yarn/staging/history/done/2017/01/09/000041
drwxrwx---   - sparkadmin supergroup          0 2017-01-09 08:10 /tmp/hadoop-yarn/staging/history/done/2017/01/09/000042


[sparkadmin@spark5 ~]$ hadoop fs -ls /tmp/hadoop-yarn/staging/history/done/2017/01/09/000041
-rwxrwx---   3 sparkadmin supergroup      21171 2017-01-09 00:06 /tmp/hadoop-yarn/staging/history/done/2017/01/09/000041/job_1480040162446_41871-1483891560801-sparkadmin-QueryResult.jar-1483891570304-1-0-SUCCEEDED-default-1483891564771.jhist
-rwxrwx---   3 sparkadmin supergroup     119729 2017-01-09 00:06 /tmp/hadoop-yarn/staging/history/done/2017/01/09/000041/job_1480040162446_41871_conf.xml
-rwxrwx---   3 sparkadmin supergroup      21074 2017-01-09 00:06 /tmp/hadoop-yarn/staging/history/done/2017/01/09/000041/job_1480040162446_41872-1483891560520-sparkadmin-QueryResult.jar-1483891569932-1-0-SUCCEEDED-default-1483891564971.jhist
-rwxrwx---   3 sparkadmin supergroup     119533 2017-01-09 00:06 /tmp/hadoop-yarn/staging/history/done/2017/01/09/000041/job_1480040162446_41872_conf.xml

 
由于历史作业记录非常多,所以历史作业记录是按照 年/月/日的形式分别存放在相应的目录中,这样便于管理和查找; 
对于每一个Hadoop历史作业记录相关信息都用两个文件存放,后缀名分别为*.jhist,*.xml。

*.jhist文件里存放的是具体Hadoop作业的详细信息;*.xml文件里面记录的是相应作业运行时的完整参数配置


*.jhist文件里存放的是Hadoop job初始化的信息,*.jhist文件里面全部都是Json格式的数据。根据type进行区分这条Json的含义

{
   "type": "JOB_INITED",
   "event": {
      "org.apache.hadoop.mapreduce.jobhistory.JobInited": {
         "jobid": "job_1388830974669_1215999",
         "launchTime": 1392477383583,
         "totalMaps": 1,
         "totalReduces": 1,
         "jobStatus": "INITED",
         "uberized": false
      }
   }
}


如果对Hadoop历史服务器WEB UI上提供的数据不满意,我们可以通过对mapreduce.jobhistory.done-dir配置的目录进行分析,得到我们感兴趣的信息。

比如统计某天中运行了多少个map、运行最长的作业用了多少时间、每个用户运行的Mapreduce任务数、总共运行了多少Mapreduce数等信息,这样对监控Hadoop集群是很好的,我们可以根据这些信息来确定给某个用户分配资源等等。

在Hadoop历史服务器的WEB UI上最多显示20000个历史的作业记录信息;其实我们可以通过下面的参数进行配置,然后重启一下Hadoop jobhistory即可。
<property>
    <name>mapreduce.jobhistory.joblist.cache.size</name>
    <value>20000</value>
</property>


0 0
原创粉丝点击