MapReduce程序测试

来源:互联网 发布:手机壁纸主题软件 编辑:程序博客网 时间:2024/05/21 19:40

下面是在《Hadoop实战》中的一段代码,测试步骤如下

1、在完成hadoop环境搭建的基础,将 export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath):$CLASSPATH添加到/etc/profile末尾的,可使代码中的import语句完成导入,具体代码如下

import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class WordCount{
    public static class Map extends MapReduceBase implements Mapper<LongWritable,Text,Text,IntWritable>{
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();
        public void map(LongWritable key,Text value,OutputCollector<Text,IntWritable> output,Reporter reporter) throws IOException{
            String line=value.toString();
            StringTokenizer tokenizer=new StringTokenizer(line);
            while(tokenizer.hasMoreTokens()){
                word.set(tokenizer.nextToken());
                output.collect(word,one);
            }
        }
    }
    public static class Reduce extends MapReduceBase implements Reducer<Text,IntWritable,Text,IntWritable>{
        public void reduce(Text key,Iterator<IntWritable> values,OutputCollector<Text,IntWritable> output,Reporter reporter) throws IOException{
            int sum = 0;
            while(values.hasNext()){
                sum +=values.next().get();
            }
            output.collect(key,new IntWritable(sum));
        }
    }
    public static void main(String[] args) throws Exception{
        JobConf  conf = new JobConf(WordCount.class);
        conf.setJobName("wordcount");
        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);
        conf.setMapperClass(Map.class);
        conf.setReducerClass(Reduce.class);
        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);
        FileInputFormat.setInputPaths(conf,new Path(args[0]));
        FileOutputFormat.setOutputPath(conf,new Path(args[1]));
        JobClient.runJob(conf);    
    }
}
2、代码保存为\home\hadoop\codetest\helloworld\WordCount.java

3、在\home\hadoop\codetest\helloworld\进行以下操作

$javac WordCount.jar

$jar -cvf wordcount.jar -c ./ .

$hadoop -dfs -mkdir /user

$hadoop -dfs -mkdir /user/hadoop

$hadoop -dfs -mkdir /user/hadoop/input

$echo "Hello World Bye World" >file01

$echo "Hello Hadoop Goodbye Hadoop">file02

$hadoop dfs -put file* /user/hadoop/input/

$rm -f file*

$hadoop jar wordcount.jar WordCount input output

测试时,重新运行时,可以先将output删除

$hadoop -dfs -rm -r -r /user/hadoop/output


4、结果

16/11/22 16:29:42 INFO client.RMProxy: Connecting to ResourceManager at hadoop-namenode/192.168.137.11:8032
16/11/22 16:29:42 INFO client.RMProxy: Connecting to ResourceManager at hadoop-namenode/192.168.137.11:8032
16/11/22 16:29:43 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/11/22 16:29:43 INFO mapred.FileInputFormat: Total input paths to process : 2
16/11/22 16:29:43 INFO mapreduce.JobSubmitter: number of splits:25
16/11/22 16:29:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1479800054795_0003
16/11/22 16:29:44 INFO impl.YarnClientImpl: Submitted application application_1479800054795_0003
16/11/22 16:29:44 INFO mapreduce.Job: The url to track the job: http://hadoop-namenode:8088/proxy/application_1479800054795_0003/
16/11/22 16:29:44 INFO mapreduce.Job: Running job: job_1479800054795_0003
16/11/22 16:29:51 INFO mapreduce.Job: Job job_1479800054795_0003 running in uber mode : false
16/11/22 16:29:51 INFO mapreduce.Job:  map 0% reduce 0%
16/11/22 16:30:19 INFO mapreduce.Job:  map 12% reduce 0%
16/11/22 16:30:21 INFO mapreduce.Job:  map 16% reduce 0%
16/11/22 16:30:23 INFO mapreduce.Job:  map 20% reduce 0%
16/11/22 16:30:25 INFO mapreduce.Job:  map 28% reduce 0%
16/11/22 16:30:26 INFO mapreduce.Job:  map 35% reduce 0%
16/11/22 16:30:27 INFO mapreduce.Job:  map 36% reduce 0%
16/11/22 16:30:32 INFO mapreduce.Job:  map 40% reduce 0%
16/11/22 16:30:37 INFO mapreduce.Job:  map 43% reduce 0%
16/11/22 16:30:38 INFO mapreduce.Job:  map 44% reduce 0%
16/11/22 16:30:40 INFO mapreduce.Job:  map 53% reduce 0%
16/11/22 16:30:41 INFO mapreduce.Job:  map 60% reduce 0%
16/11/22 16:30:46 INFO mapreduce.Job:  map 68% reduce 0%
16/11/22 16:30:49 INFO mapreduce.Job:  map 84% reduce 0%
16/11/22 16:30:54 INFO mapreduce.Job:  map 96% reduce 0%
16/11/22 16:31:09 INFO mapreduce.Job:  map 100% reduce 0%
16/11/22 16:31:13 INFO mapreduce.Job:  map 100% reduce 25%
16/11/22 16:31:14 INFO mapreduce.Job:  map 100% reduce 50%
16/11/22 16:31:16 INFO mapreduce.Job:  map 100% reduce 100%
16/11/22 16:31:18 INFO mapreduce.Job: Job job_1479800054795_0003 completed successfully
16/11/22 16:31:18 INFO mapreduce.Job: Counters: 50
        File System Counters
                FILE: Number of bytes read=122
                FILE: Number of bytes written=3353307
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=2942
                HDFS: Number of bytes written=41
                HDFS: Number of read operations=87
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=8
        Job Counters
                Launched map tasks=25
                Launched reduce tasks=4
                Data-local map tasks=23
                Rack-local map tasks=2
                Total time spent by all maps in occupied slots (ms)=1080282
                Total time spent by all reduces in occupied slots (ms)=81957
                Total time spent by all map tasks (ms)=1080282
                Total time spent by all reduce tasks (ms)=81957
                Total vcore-seconds taken by all map tasks=1080282
                Total vcore-seconds taken by all reduce tasks=81957
                Total megabyte-seconds taken by all map tasks=1106208768
                Total megabyte-seconds taken by all reduce tasks=83923968
        Map-Reduce Framework
                Map input records=2
                Map output records=8
                Map output bytes=82
                Map output materialized bytes=698
                Input split bytes=2600
                Combine input records=0
                Combine output records=0
                Reduce input groups=5
                Reduce shuffle bytes=698
                Reduce input records=8
                Reduce output records=5
                Spilled Records=16
                Shuffled Maps =100
                Failed Shuffles=0
                Merged Map outputs=100
                GC time elapsed (ms)=4020
                CPU time spent (ms)=22170
                Physical memory (bytes) snapshot=6518628352
                Virtual memory (bytes) snapshot=25351811072
                Total committed heap usage (bytes)=4766302208
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=342
        File Output Format Counters
                Bytes Written=41

$ hadoop dfs -cat /user/hadoop/output/part-00000
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Goodbye    1
$ hadoop dfs -cat /user/hadoop/output/part-00001
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Bye    1
Hello    2
World    2
$ hadoop dfs -cat /user/hadoop/output/part-00002
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Hadoop    2


0 0
原创粉丝点击