hadoop-2.4.0 wordcount

来源：互联网发布：php调用微信支付接口编辑：程序博客网时间：2024/05/17 03:35

已经在我机子UBUNTU13.10上面安装了hadoop的伪节点分布，现在我想运行一个最基本的测试程序。

首先我的电脑名字是Tank，我在这台电脑上的账户名是joe

我的hadoop安装在本地电脑/usr/local/下面

好，首先要知道examples是在哪个地方给出来的.在 /usr/local/hadoop-2.4.0//share/hadoop/下面。运行

joe@Tank:/usr/local/hadoop-2.4.0$ ./bin/hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
or
CLASSNAME run the class named CLASSNAME

Most commands print help when invoked w/o parameters.

可以看到其中可以运行jar文件，于是接着运行

joe@Tank:/usr/local/hadoop-2.4.0$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

可以看到其中倒数地四行有wordcount这个例子

然后在本地电脑桌面上创建一个文本healthewold.text，里面是heal the world的歌词。接着在hadoop上创建一个input文件夹，用下面的命令

joe@Tank:/usr/local/hadoop-2.4.0$ hadoop fs -mkdir /input

之后，将文本上传到input里面

joe@Tank:/usr/local/hadoop-2.4.0$ hadoop fs -put ~/Desktop/healtheworld.txt /input

然后运行

joe@Tank:/usr/local/hadoop-2.4.0$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar wordcount /input output

下面是输出

joe@Tank:/usr/local/hadoop-2.4.0$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar wordcount /input out1
14/05/28 23:00:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/05/28 23:00:31 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/05/28 23:00:31 INFO input.FileInputFormat: Total input paths to process : 1
14/05/28 23:00:32 INFO mapreduce.JobSubmitter: number of splits:1
14/05/28 23:00:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1401271044129_0003
14/05/28 23:00:32 INFO impl.YarnClientImpl: Submitted application application_1401271044129_0003
14/05/28 23:00:32 INFO mapreduce.Job: The url to track the job: http://Tank:8088/proxy/application_1401271044129_0003/
14/05/28 23:00:32 INFO mapreduce.Job: Running job: job_1401271044129_0003
14/05/28 23:00:38 INFO mapreduce.Job: Job job_1401271044129_0003 running in uber mode : false
14/05/28 23:00:38 INFO mapreduce.Job: map 0% reduce 0%
14/05/28 23:00:42 INFO mapreduce.Job: map 100% reduce 0%
14/05/28 23:00:47 INFO mapreduce.Job: map 100% reduce 100%
14/05/28 23:00:47 INFO mapreduce.Job: Job job_1401271044129_0003 completed successfully
14/05/28 23:00:48 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=1758
FILE: Number of bytes written=189201
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2025
HDFS: Number of bytes written=1147
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=1933
Total time spent by all reduces in occupied slots (ms)=2207
Total time spent by all map tasks (ms)=1933
Total time spent by all reduce tasks (ms)=2207
Total vcore-seconds taken by all map tasks=1933
Total vcore-seconds taken by all reduce tasks=2207
Total megabyte-seconds taken by all map tasks=1979392
Total megabyte-seconds taken by all reduce tasks=2259968
Map-Reduce Framework
Map input records=85
Map output records=390
Map output bytes=3463
Map output materialized bytes=1758
Input split bytes=109
Combine input records=390
Combine output records=153
Reduce input groups=153
Reduce shuffle bytes=1758
Reduce input records=153
Reduce output records=153
Spilled Records=306
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=28
CPU time spent (ms)=1250
Physical memory (bytes) snapshot=421142528
Virtual memory (bytes) snapshot=1451544576
Total committed heap usage (bytes)=402653184
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1916
File Output Format Counters
Bytes Written=1147

然后注意，output实在/user/joe/下的

查看结果

joe@Tank:/usr/local/hadoop-2.4.0$ hadoop fs -cat /user/joe/output/part-r-00000
14/05/28 23:12:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
... 6
And 8
Be 1
Brighter 1
Create 1
Fear 1
For 4
Heal 4
I 2
If 8
In 3
It 1
Let 1
Love 1
Love's 1
Make 11
Save 1
See 1
So 1
Then 2
There 5
There's 3
This 1
Though 1
To 1
Together 1
We 3
Will 2
Wound 1
You 1
You'll 1
a 16
again 1
all 1
always 1
and 9
are 7
be 1
believed 1
better 11
bliss 1
brothers 1
cannot 2
care 5
cared 1
cares 1
children 1
conceived 1
could 3
crucify 1
cry 2
die 1
do 1
dread 1
dream 1
dying 4
earth, 1
enough 6
entire 3
existing 1
face 1
fear 1
feel 3
feels 1
find 1
fly 1
for 19
get 2
giving 1
glow 1
god's 1
grace 1
growing 1
happy 1
heart 2
heavenly 1
high 1
human 3
hurt 1
if 1
in 4
into 1
is 3
it 7
it's 1
its 1
joyful 2
keep 1
know 2
lie 1
life 1
little 2
living 8
love 2
make 2
me 8
much 1
my 2
nations 1
need 1
neough 1
never 1
no 3
of 1
once 1
only 1
or 2
our 2
people 3
pepole 1
place 13
plain 1
plowshares 1
race 3
really 2
reveal 1
see 2
shall 1
shine 1
so 1
sorrow 1
soul 1
space 2
spirits 1
start 1
stop 1
strangling 1
strong 1
swords 1
tears 1
than 1
that 3
the 16
their 1
there 2
there's 1
this 4
to 4
tomorrow 1
try 2
turn 1
us 1
want 1
ways 1
we 7
we'll 1
were 1
why 2
with 1
world 8
you 16
you'll 1
your 1

完毕。。。

0 0