Hadoop之MapReduce命令

来源：互联网发布：黄网络直播编辑：程序博客网时间：2024/06/06 02:50

概述

所有的Hadoop命令都通过bin/mapred脚本调用。在没有任何参数的情况下，运行mapred脚本将打印该命令描述。

使用：mapred [--config confdir] COMMAND

[hadoop@hadoopcluster78 bin]$ mapredUsage: mapred [--config confdir] COMMAND       where COMMAND is one of:  pipes                run a Pipes job  job                  manipulate MapReduce jobs  queue                get information regarding JobQueues  classpath            prints the class path needed for running                       mapreduce subcommands  historyserver        run job history servers as a standalone daemon  distcp <srcurl> <desturl> copy file or directories recursively  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive  hsadmin              job history server admin interfaceMost commands print help when invoked w/o parameters.

用户命令

对于Hadoop集群用户很有用的命令：

`archive`

查看：Hadoop之命令指南

`classpath`

打印需要得到Hadoop的jar和所需要的lib包路径，hdfs，yarn脚本都有这个命令。

使用: mapred classpath

`distcp`

递归的拷贝文件或者目录，查看该篇中的示例：Hadoop之命令指南。

`job`

通过job命令和MapReduce任务交互。

参数选项描述-submit job-file提交一个job.-status job-id打印map任务和reduce任务完成百分比和所有JOB的计数器。-counter job-id group-name counter-name打印计数器的值。
-kill job-id根据job-id杀掉指定job.-events job-id from-event-# #-of-events打印给力访问内jobtracker接受到的事件细节。（使用方法见示例）-history [all]jobOutputDir打印JOB的细节，失败和杀掉原因的细节。更多的关于一个作业的细节比如:成功的任务和每个任务尝试等信息可以通过指定[all]选项查看。-list [all]打印当前正在运行的JOB，如果加了all，则打印所有的JOB。-kill-task task-idKill任务，杀掉的任务不记录失败重试的数量。-fail-task task-idFail任务，杀掉的任务不记录失败重试的数量。
默认任务的尝试次数是4次超过四次则不尝试。那么如果使用fail-task命令fail同一个任务四次，这个任务将不会继续尝试，而且会导致整个JOB失败。
-set-priority job-id priority改变JOB的优先级。允许的优先级有：VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW

示例：

[hadoop@hadoopcluster78 bin]$ mapred job -events job_1437364567082_0109 0 10015/08/13 15:10:53 INFO client.RMProxy: Connecting to ResourceManager at hadoopcluster79/10.0.1.79:8032Task completion events for job_1437364567082_0109Number of events (from 0) are: 1SUCCEEDED attempt_1437364567082_0109_m_000016_0 http://hadoopcluster83:13562/tasklog?plaintext=true&attemptid=attempt_1437364567082_0109_m_000016_0[hadoop@hadoopcluster78 bin]$ mapred job -kill-task attempt_1437364567082_0111_m_000000_415/08/13 15:51:25 INFO client.RMProxy: Connecting to ResourceManager at hadoopcluster79/10.0.1.79:8032Killed task attempt_1437364567082_0111_m_000000_4

`pipes`

运行pipes JOB。关于pipe，查看：Hadoop pipes编程

Hadoop pipes允许C++程序员编写mapreduce程序。它允许用户混用C++和Java的RecordReader， Mapper， Partitioner，Rducer和RecordWriter等五个组件。

Usage: mapred pipes [-conf <path>] [-jobconf <key=value>, <key=value>, ...] [-input <path>] [-output <path>] [-jar <jar file>] [-inputformat <class>] [-map <class>] [-partitioner <class>] [-reduce <class>] [-writer <class>] [-program <executable>] [-reduces <num>]

参数选项描述-conf pathJob的配置文件路径。-jobconf key=value, key=value, …增加/重载 JOB的配置。-input path输入路径-output path输出路径-jar jar fileJAR文件名-inputformat classInputFormat类-map classJava Map 类
-partitioner classJava Partitioner-reduce classJava Reduce 类-writer classJava RecordWriter-program executable可执行的URI-reduces numreduce的数量

`queue`

该命令用于交互和查看Job Queue信息。

使用: mapred queue [-list] | [-info <job-queue-name> [-showJobs]] | [-showacls]

参数选项描述-list获取在系统配置的Job Queues列表。已经Job Queues的调度信息。-info job-queue-name [-showJobs]显示一个指定Job Queue的信息和它的调度信息。如果使用-showJobs选项，则显示当前正在运行的JOB列表。-showacls显示队列名和允许当前用户对队列的相关操作。这个命令打印的命令是当前用户可以访问的。

示例：

[hadoop@hadoopcluster78 bin]$ mapred queue -list15/08/13 14:25:30 INFO client.RMProxy: Connecting to ResourceManager at hadoopcluster79/10.0.1.79:8032======================Queue Name : default Queue State : running Scheduling Info : Capacity: 100.0, MaximumCapacity: 100.0, CurrentCapacity: 47.5[hadoop@hadoopcluster78 bin]$ mapred queue -info default15/08/13 14:28:45 INFO client.RMProxy: Connecting to ResourceManager at hadoopcluster79/10.0.1.79:8032======================Queue Name : default Queue State : running Scheduling Info : Capacity: 100.0, MaximumCapacity: 100.0, CurrentCapacity: 72.5[hadoop@hadoopcluster78 bin]$ mapred queue -info default -showJobs15/08/13 14:29:08 INFO client.RMProxy: Connecting to ResourceManager at hadoopcluster79/10.0.1.79:8032======================Queue Name : default Queue State : running Scheduling Info : Capacity: 100.0, MaximumCapacity: 100.0, CurrentCapacity: 72.5 Total jobs:1                  JobId         State         StartTime        UserName           Queue      Priority     UsedContainers     RsvdContainers     UsedMem     RsvdMem     NeededMem       AM info job_1437364567082_0107       RUNNING     1439447102615            root         default        NORMAL                 28                  0      29696M          0M        29696M    http://hadoopcluster79:8088/proxy/application_1437364567082_0107/[hadoop@hadoopcluster78 bin]$ mapred queue -showacls15/08/13 14:31:44 INFO client.RMProxy: Connecting to ResourceManager at hadoopcluster79/10.0.1.79:8032Queue acls for user :  hadoopQueue  Operations=====================root  ADMINISTER_QUEUE,SUBMIT_APPLICATIONSdefault  ADMINISTER_QUEUE,SUBMIT_APPLICATIONS

管理员命令

以下是对hadoop集群超级管理员很有用的命令。

`historyserver`

启动JobHistoryServer服务。

使用: mapred historyserver

也可以使用sbin/mr-jobhistory-daemon.sh start|stop historyserver来启动/停止JobHistoryServer。

`hsadmin`

运行hsadmin去执行JobHistoryServer管理命令。

参数配置描述-refreshUserToGroupsMappings刷新用户-组的对应关系。-refreshSuperUserGroupsConfiguration刷新超级用户代理组映射-refreshAdminAcls刷新JobHistoryServer管理的ACL-refreshLoadedJobCache刷新JobHistoryServer加载JOB的缓存-refreshJobRetentionSettings刷新Job histroy旗舰，job cleaner被设置。-refreshLogRetentionSettings刷新日志保留周期和日志保留的检查间隔-getGroups [username]获取这个用户名属于哪个组-help [cmd]帮助示例：

[hadoop@hadoopcluster78 bin]$ mapred hsadmin -getGroups hadoophadoop : clustergroup

1 0