Hadoop本地模式

来源:互联网 发布:ubuntu kill 编辑:程序博客网 时间:2024/04/27 22:11

如何安装和配置Hadoop已经有很多资料可以查了,官方文档http://hadoop.apache.org/common/docs/r0.19.2/cn/quickstart.html。


但是还是经常会出一些问题。本地模式是推荐的开发模式,在Windows上配合Cygwin容易出错。


Hadoop的三种模式:

  • 单机模式
  • 伪分布式模式
  • 完全分布式模式
分布式模式增加了NameNode和JobTracker,也就是默认需要连接到一些本地端口(localhost:9000,localhost:9001),不能直接用hadoop命令执行jar文件。需要事先启动HDFS和JobTracer(可以调用start-all.sh)。

Hadoop默认就是单机模式,因为安装完成后修改JAVE_HOME路径即可。该配置在conf\hadoop-env.sh中,写上Java的安装路径即可。安装路径为Windows安装路径也没有问题,Cygwin会做路径的转换访问。如果路径中带有空格,需要用引号把整个路径括起来。例如export JAVA_HOME='E:/Java/jdk16'

然后可以试试快速入门的示例了
$ mkdir input 
$ cp conf/*.xml input 
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' 
$ cat output/*

如果没有问题,第三条命令会输出执行的Log:
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
12/01/08 13:52:03 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName
=JobTracker, sessionId=
12/01/08 13:52:03 INFO mapred.FileInputFormat: Total input paths to process : 4
12/01/08 13:52:04 INFO mapred.JobClient: Running job: job_local_0001
12/01/08 13:52:04 INFO mapred.FileInputFormat: Total input paths to process : 4
12/01/08 13:52:04 INFO mapred.MapTask: numReduceTasks: 1
12/01/08 13:52:04 INFO mapred.MapTask: io.sort.mb = 100
12/01/08 13:52:04 INFO mapred.MapTask: data buffer = 79691776/99614720
12/01/08 13:52:04 INFO mapred.MapTask: record buffer = 262144/327680
12/01/08 13:52:04 INFO mapred.MapTask: Starting flush of map output
12/01/08 13:52:04 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is
done. And is in the process of commiting
12/01/08 13:52:04 INFO mapred.LocalJobRunner: file:/E:/Apache/hadoop/run/hadoop-
0.20.2/input/capacity-scheduler.xml:0+3936
12/01/08 13:52:04 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' d
one.
12/01/08 13:52:05 INFO mapred.MapTask: numReduceTasks: 1
12/01/08 13:52:05 INFO mapred.MapTask: io.sort.mb = 100
12/01/08 13:52:05 INFO mapred.MapTask: data buffer = 79691776/99614720
12/01/08 13:52:05 INFO mapred.MapTask: record buffer = 262144/327680
12/01/08 13:52:05 INFO mapred.MapTask: Starting flush of map output
12/01/08 13:52:05 INFO mapred.MapTask: Finished spill 0
12/01/08 13:52:05 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000001_0 is
done. And is in the process of commiting
12/01/08 13:52:05 INFO mapred.LocalJobRunner: file:/E:/Apache/hadoop/run/hadoop-
0.20.2/input/core-site.xml:0+13486
12/01/08 13:52:05 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000001_0' d
one.
12/01/08 13:52:05 INFO mapred.MapTask: numReduceTasks: 1
12/01/08 13:52:05 INFO mapred.MapTask: io.sort.mb = 100
12/01/08 13:52:05 INFO mapred.JobClient:  map 100% reduce 0%
12/01/08 13:52:05 INFO mapred.MapTask: data buffer = 79691776/99614720
12/01/08 13:52:05 INFO mapred.MapTask: record buffer = 262144/327680
12/01/08 13:52:05 INFO mapred.MapTask: Starting flush of map output
12/01/08 13:52:05 INFO mapred.MapTask: Finished spill 0
12/01/08 13:52:05 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000002_0 is
done. And is in the process of commiting
12/01/08 13:52:05 INFO mapred.LocalJobRunner: file:/E:/Apache/hadoop/run/hadoop-
0.20.2/input/hdfs-default.xml:0+10686
12/01/08 13:52:05 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000002_0' d
one.
12/01/08 13:52:05 INFO mapred.MapTask: numReduceTasks: 1
12/01/08 13:52:05 INFO mapred.MapTask: io.sort.mb = 100
12/01/08 13:52:05 INFO mapred.MapTask: data buffer = 79691776/99614720
12/01/08 13:52:05 INFO mapred.MapTask: record buffer = 262144/327680
12/01/08 13:52:05 INFO mapred.MapTask: Starting flush of map output
12/01/08 13:52:05 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000003_0 is
done. And is in the process of commiting
12/01/08 13:52:05 INFO mapred.LocalJobRunner: file:/E:/Apache/hadoop/run/hadoop-
0.20.2/input/mapred-site.xml:0+29907
12/01/08 13:52:05 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000003_0' d
one.
12/01/08 13:52:05 INFO mapred.LocalJobRunner:
12/01/08 13:52:05 INFO mapred.Merger: Merging 4 sorted segments
12/01/08 13:52:05 INFO mapred.Merger: Down to the last merge-pass, with 2 segmen
ts left of total size: 1610 bytes
12/01/08 13:52:05 INFO mapred.LocalJobRunner:
12/01/08 13:52:05 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is
done. And is in the process of commiting
12/01/08 13:52:05 INFO mapred.LocalJobRunner:
12/01/08 13:52:05 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is
allowed to commit now
12/01/08 13:52:05 INFO mapred.FileOutputCommitter: Saved output of task 'attempt
_local_0001_r_000000_0' to file:/E:/Apache/hadoop/run/hadoop-0.20.2/grep-temp-18
85996921
12/01/08 13:52:05 INFO mapred.LocalJobRunner: reduce > reduce
12/01/08 13:52:05 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' d
one.
12/01/08 13:52:06 INFO mapred.JobClient:  map 100% reduce 100%
12/01/08 13:52:06 INFO mapred.JobClient: Job complete: job_local_0001
12/01/08 13:52:06 INFO mapred.JobClient: Counters: 13
12/01/08 13:52:06 INFO mapred.JobClient:   FileSystemCounters
12/01/08 13:52:06 INFO mapred.JobClient:     FILE_BYTES_READ=947721
12/01/08 13:52:06 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=866633
12/01/08 13:52:06 INFO mapred.JobClient:   Map-Reduce Framework
12/01/08 13:52:06 INFO mapred.JobClient:     Reduce input groups=49
12/01/08 13:52:06 INFO mapred.JobClient:     Combine output records=49
12/01/08 13:52:06 INFO mapred.JobClient:     Map input records=1776
12/01/08 13:52:06 INFO mapred.JobClient:     Reduce shuffle bytes=0
12/01/08 13:52:06 INFO mapred.JobClient:     Reduce output records=49
12/01/08 13:52:06 INFO mapred.JobClient:     Spilled Records=98
12/01/08 13:52:06 INFO mapred.JobClient:     Map output bytes=1576
12/01/08 13:52:06 INFO mapred.JobClient:     Map input bytes=58015
12/01/08 13:52:06 INFO mapred.JobClient:     Combine input records=53
12/01/08 13:52:06 INFO mapred.JobClient:     Map output records=53
12/01/08 13:52:06 INFO mapred.JobClient:     Reduce input records=49
12/01/08 13:52:06 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with proces
sName=JobTracker, sessionId= - already initialized
12/01/08 13:52:06 WARN mapred.JobClient: Use GenericOptionsParser for parsing th
e arguments. Applications should implement Tool for the same.
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/E:/A
pache/hadoop/run/hadoop-0.20.2/output already exists
        at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutput
Format.java:111)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:7
72)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
        at org.apache.hadoop.examples.Grep.run(Grep.java:84)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.examples.Grep.main(Grep.java:93)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(Progra
mDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

后面捕获的一个异常是由于输入文件夹以存在,这个是Hadoop的设计哲学。

原创粉丝点击