Hadoop之MapReduce命令
来源:互联网 发布:黄网络直播 编辑:程序博客网 时间:2024/06/06 02:50
概述
所有的Hadoop命令都通过bin/mapred脚本调用。在没有任何参数的情况下,运行mapred脚本将打印该命令描述。
使用:mapred [--config confdir] COMMAND
[hadoop@hadoopcluster78 bin]$ mapredUsage: mapred [--config confdir] COMMAND where COMMAND is one of: pipes run a Pipes job job manipulate MapReduce jobs queue get information regarding JobQueues classpath prints the class path needed for running mapreduce subcommands historyserver run job history servers as a standalone daemon distcp <srcurl> <desturl> copy file or directories recursively archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive hsadmin job history server admin interfaceMost commands print help when invoked w/o parameters.
用户命令
对于Hadoop集群用户很有用的命令:
archive
查看:Hadoop之命令指南
classpath
打印需要得到Hadoop的jar和所需要的lib包路径,hdfs,yarn脚本都有这个命令。
使用: mapred classpath
distcp
递归的拷贝文件或者目录,查看该篇中的示例:Hadoop之命令指南。
job
通过job命令和MapReduce任务交互。
使用:mapred job | [GENERIC_OPTIONS] | [-submit <job-file>] | [-status <job-id>] | [-counter <job-id> <group-name> <counter-name>] | [-kill <job-id>] | [-events <job-id> <from-event-#> <#-of-events>] | [-history [all] <jobOutputDir>] | [-list [all]] | [-kill-task <task-id>] | [-fail-task <task-id>] | [-set-priority <job-id> <priority>]
-kill job-id根据job-id杀掉指定job.-events job-id from-event-# #-of-events打印给力访问内jobtracker接受到的事件细节。(使用方法见示例)-history [all]jobOutputDir打印JOB的细节,失败和杀掉原因的细节。更多的关于一个作业的细节比如:成功的任务和每个任务尝试等信息可以通过指定[all]选项查看。-list [all]打印当前正在运行的JOB,如果加了all,则打印所有的JOB。-kill-task task-idKill任务,杀掉的任务不记录失败重试的数量。-fail-task task-idFail任务,杀掉的任务不记录失败重试的数量。
默认任务的尝试次数是4次超过四次则不尝试。那么如果使用fail-task命令fail同一个任务四次,这个任务将不会继续尝试,而且会导致整个JOB失败。
-set-priority job-id priority改变JOB的优先级。允许的优先级有:VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW
示例:
[hadoop@hadoopcluster78 bin]$ mapred job -events job_1437364567082_0109 0 10015/08/13 15:10:53 INFO client.RMProxy: Connecting to ResourceManager at hadoopcluster79/10.0.1.79:8032Task completion events for job_1437364567082_0109Number of events (from 0) are: 1SUCCEEDED attempt_1437364567082_0109_m_000016_0 http://hadoopcluster83:13562/tasklog?plaintext=true&attemptid=attempt_1437364567082_0109_m_000016_0[hadoop@hadoopcluster78 bin]$ mapred job -kill-task attempt_1437364567082_0111_m_000000_415/08/13 15:51:25 INFO client.RMProxy: Connecting to ResourceManager at hadoopcluster79/10.0.1.79:8032Killed task attempt_1437364567082_0111_m_000000_4
pipes
运行pipes JOB。关于pipe,查看:Hadoop pipes编程
Hadoop pipes允许C++程序员编写mapreduce程序。它允许用户混用C++和Java的RecordReader, Mapper, Partitioner,Rducer和RecordWriter等五个组件。
Usage: mapred pipes [-conf <path>] [-jobconf <key=value>, <key=value>, ...] [-input <path>] [-output <path>] [-jar <jar file>] [-inputformat <class>] [-map <class>] [-partitioner <class>] [-reduce <class>] [-writer <class>] [-program <executable>] [-reduces <num>]
参数选项 描述 -conf pathJob的配置文件路径。-jobconf key=value, key=value, …增加/重载 JOB的配置。-input path输入路径-output path输出路径-jar jar fileJAR文件名-inputformat classInputFormat类-map classJava Map 类
-partitioner classJava Partitioner-reduce classJava Reduce 类-writer classJava RecordWriter-program executable可执行的URI-reduces numreduce的数量
queue
该命令用于交互和查看Job Queue信息。
使用: mapred queue [-list] | [-info <job-queue-name> [-showJobs]] | [-showacls]
-showJobs选项,则显示当前正在运行的JOB列表。
-showacls显示队列名和允许当前用户对队列的相关操作。这个命令打印的命令是当前用户可以访问的。[hadoop@hadoopcluster78 bin]$ mapred queue -list15/08/13 14:25:30 INFO client.RMProxy: Connecting to ResourceManager at hadoopcluster79/10.0.1.79:8032======================Queue Name : default Queue State : running Scheduling Info : Capacity: 100.0, MaximumCapacity: 100.0, CurrentCapacity: 47.5[hadoop@hadoopcluster78 bin]$ mapred queue -info default15/08/13 14:28:45 INFO client.RMProxy: Connecting to ResourceManager at hadoopcluster79/10.0.1.79:8032======================Queue Name : default Queue State : running Scheduling Info : Capacity: 100.0, MaximumCapacity: 100.0, CurrentCapacity: 72.5[hadoop@hadoopcluster78 bin]$ mapred queue -info default -showJobs15/08/13 14:29:08 INFO client.RMProxy: Connecting to ResourceManager at hadoopcluster79/10.0.1.79:8032======================Queue Name : default Queue State : running Scheduling Info : Capacity: 100.0, MaximumCapacity: 100.0, CurrentCapacity: 72.5 Total jobs:1 JobId State StartTime UserName Queue Priority UsedContainers RsvdContainers UsedMem RsvdMem NeededMem AM info job_1437364567082_0107 RUNNING 1439447102615 root default NORMAL 28 0 29696M 0M 29696M http://hadoopcluster79:8088/proxy/application_1437364567082_0107/[hadoop@hadoopcluster78 bin]$ mapred queue -showacls15/08/13 14:31:44 INFO client.RMProxy: Connecting to ResourceManager at hadoopcluster79/10.0.1.79:8032Queue acls for user : hadoopQueue Operations=====================root ADMINISTER_QUEUE,SUBMIT_APPLICATIONSdefault ADMINISTER_QUEUE,SUBMIT_APPLICATIONS
管理员命令
以下是对hadoop集群超级管理员很有用的命令。
historyserver
启动JobHistoryServer服务。
使用: mapred historyserver
也可以使用sbin/mr-jobhistory-daemon.sh start|stop historyserver来启动/停止JobHistoryServer。
hsadmin
运行hsadmin去执行JobHistoryServer管理命令。
Usage: mapred hsadmin [-refreshUserToGroupsMappings] | [-refreshSuperUserGroupsConfiguration] | [-refreshAdminAcls] | [-refreshLoadedJobCache] | [-refreshLogRetentionSettings] | [-refreshJobRetentionSettings] | [-getGroups [username]] | [-help [cmd]]
[hadoop@hadoopcluster78 bin]$ mapred hsadmin -getGroups hadoophadoop : clustergroup
- Hadoop之MapReduce命令
- Hadoop之Mapreduce------>Mapreduce原理
- Hadoop--Hadoop核心之MapReduce
- hadoop之mapreduce
- Hadoop之MapReduce
- Hadoop 之 mapreduce
- Hadoop之mapreduce
- hadoop之mapreduce实例
- Hadoop之MapReduce
- hadoop之mapReduce
- Hadoop之MapReduce 分析
- Hadoop之MapReduce概念
- HADOOP之MAPREDUCE
- Hadoop之MapReduce
- Hadoop之谈谈MapReduce
- hadoop之MAPREDUCE
- 初学Hadoop之MapReduce
- Hadoop之MapReduce & HDFS
- cocos2d-x3.2自己测试Box2D一些坑
- day-0811-Server
- android与js交互(一)
- VSFTPD问题集:425 Security: Bad IP connecting.
- 初步了解MySQL 数据库
- Hadoop之MapReduce命令
- bootstrap 兼容ie7ie8ie9ie10
- HDU 1181 变形课
- iOS开发-Day22-OC 延展和协议以及深浅复制
- 最长递增子序列O(Nlogn)
- SQL语法之DDL和DML
- PHP获取当前时间的方法
- 小白---VSS 的记住密码
- Docker学习小结