Hadoop完全分布式环境配置及 Word Count 程序运行

来源：互联网发布：python appium 编辑：程序博客网时间：2024/05/19 13:15

一、Hadoop 完全分布式环境配置

主要参考了以下两个链接，尝试配置了以 mac 作为一台 master, 两台 ubuntu 作为 slave，也尝试了一台 ubuntu 作为 master，两台 ubuntu 作为 slave

本机环境：

mac

parallels desktop 及其上3台 ubuntu 系统

配置有什么问题，可以相互讨论！

参考链接：http://blog.csdn.net/wk51920/article/details/51686038

http://www.w2bc.com/Article/19645

涉及命令：

javac -classpath /usr/hadoop/hadoop-2.8.0/share/hadoop/common/hadoop-common-2.8.0.jar:/usr/hadoop/hadoop-2.8.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.8.0.jar WordCount.java -d classesjar -cvf WordCount.jar *hadoop fs -rm -r /outputhadoop jar WordCount.jar WordCount /input/count.txt /output

二、Word Count 程序运行

原始代码：

import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.conf.Configured;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.Tool;import org.apache.hadoop.util.ToolRunner;import java.io.IOException;import java.util.StringTokenizer;/** * Created by zhuqiuhui on 2017/7/14. */public class WordCount extends Configured implements Tool {    public int run(String[] args) throws Exception {        Configuration conf = new Configuration();        if(args.length != 2) {            System.err.println("Usage: wordcount <in> <out>");            System.exit(2);        }        Job job = new Job(conf, "wordcount");        job.setJarByClass(WordCount.class);        job.setMapperClass(TokenizerMapper.class);        job.setCombinerClass(IntSumReducer.class);        job.setReducerClass(IntSumReducer.class);        job.setOutputKeyClass(Text.class);        job.setOutputValueClass(IntWritable.class);        FileInputFormat.addInputPath(job, new Path(args[0]));        FileOutputFormat.setOutputPath(job, new Path(args[1]));        return job.waitForCompletion(true)?0:1;    }    public class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {        IntWritable one = new IntWritable(1);        Text word = new Text();        public void map(Object key, Text value, Context context) throws IOException,InterruptedException {            StringTokenizer itr = new StringTokenizer(value.toString());            while(itr.hasMoreTokens()) {                word.set(itr.nextToken());                context.write(word, one);            }        }    }    public class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {        IntWritable result = new IntWritable();        public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException,InterruptedException {            int sum = 0;            for(IntWritable val:values) {                sum += val.get();            }            result.set(sum);            context.write(key,result);        }    }    public static void main(String[] args) throws Exception {        int exitCode = ToolRunner.run(new WordCount(), args);        System.exit(exitCode);    }}

输入文本：

hadoop mapreduce   hadoop yarnhadoop hdfshadoop mapreduce   hadoop yarnhadoop hdfszqh gknlzy zqh

报错：

17/07/20 04:22:35 INFO client.RMProxy: Connecting to ResourceManager at master/10.211.55.5:803217/07/20 04:22:35 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.17/07/20 04:22:36 INFO input.FileInputFormat: Total input files to process : 117/07/20 04:22:36 INFO mapreduce.JobSubmitter: number of splits:117/07/20 04:22:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1500533815972_000117/07/20 04:22:37 INFO impl.YarnClientImpl: Submitted application application_1500533815972_000117/07/20 04:22:37 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1500533815972_0001/17/07/20 04:22:37 INFO mapreduce.Job: Running job: job_1500533815972_000117/07/20 04:22:46 INFO mapreduce.Job: Job job_1500533815972_0001 running in uber mode : false17/07/20 04:22:46 INFO mapreduce.Job:  map 0% reduce 0%17/07/20 04:22:50 INFO mapreduce.Job: Task Id : attempt_1500533815972_0001_m_000000_0, Status : FAILEDError: java.lang.RuntimeException: java.lang.NoSuchMethodException: WordCount$TokenizerMapper.<init>()at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)Caused by: java.lang.NoSuchMethodException: WordCount$TokenizerMapper.<init>()at java.lang.Class.getConstructor0(Class.java:3082)at java.lang.Class.getDeclaredConstructor(Class.java:2178)at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128)... 7 more17/07/20 04:22:53 INFO mapreduce.Job: Task Id : attempt_1500533815972_0001_m_000000_1, Status : FAILEDError: java.lang.RuntimeException: java.lang.NoSuchMethodException: WordCount$TokenizerMapper.<init>()at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)Caused by: java.lang.NoSuchMethodException: WordCount$TokenizerMapper.<init>()at java.lang.Class.getConstructor0(Class.java:3082)at java.lang.Class.getDeclaredConstructor(Class.java:2178)at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128)... 7 more17/07/20 04:22:59 INFO mapreduce.Job: Task Id : attempt_1500533815972_0001_m_000000_2, Status : FAILEDError: java.lang.RuntimeException: java.lang.NoSuchMethodException: WordCount$TokenizerMapper.<init>()at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)Caused by: java.lang.NoSuchMethodException: WordCount$TokenizerMapper.<init>()at java.lang.Class.getConstructor0(Class.java:3082)at java.lang.Class.getDeclaredConstructor(Class.java:2178)at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128)... 7 more17/07/20 04:23:04 INFO mapreduce.Job:  map 100% reduce 100%17/07/20 04:23:04 INFO mapreduce.Job: Job job_1500533815972_0001 failed with state FAILED due to: Task failed task_1500533815972_0001_m_000000Job failed as tasks failed. failedMaps:1 failedReduces:017/07/20 04:23:04 INFO mapreduce.Job: Counters: 13Job Counters Failed map tasks=4Killed reduce tasks=1Launched map tasks=4Other local map tasks=3Data-local map tasks=1Total time spent by all maps in occupied slots (ms)=11149Total time spent by all reduces in occupied slots (ms)=0Total time spent by all map tasks (ms)=11149Total time spent by all reduce tasks (ms)=0Total vcore-milliseconds taken by all map tasks=11149Total vcore-milliseconds taken by all reduce tasks=0Total megabyte-milliseconds taken by all map tasks=11416576Total megabyte-milliseconds taken by all reduce tasks=0

原因：

执行mapreduce出现的错，原因是map类和reduce没有加static修饰，因为Hadoop在调用map和reduce类时采用的反射调用，内部类不是静态的，没有获取到内部类的实例。

改后代码：

import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.conf.Configured;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.Tool;import org.apache.hadoop.util.ToolRunner;import java.io.IOException;import java.util.StringTokenizer;/** * Created by zhuqiuhui on 2017/7/14. */public class WordCount extends Configured implements Tool {    public int run(String[] args) throws Exception {        Configuration conf = new Configuration();        if(args.length != 2) {            System.err.println("Usage: wordcount <in> <out>");            System.exit(2);        }        Job job = new Job(conf, "wordcount");        job.setJarByClass(WordCount.class);        job.setMapperClass(TokenizerMapper.class);        job.setCombinerClass(IntSumReducer.class);        job.setReducerClass(IntSumReducer.class);        job.setOutputKeyClass(Text.class);        job.setOutputValueClass(IntWritable.class);        FileInputFormat.addInputPath(job, new Path(args[0]));        FileOutputFormat.setOutputPath(job, new Path(args[1]));        return job.waitForCompletion(true)?0:1;    }    public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {        IntWritable one = new IntWritable(1);        Text word = new Text();        public void map(Object key, Text value, Context context) throws IOException,InterruptedException {            StringTokenizer itr = new StringTokenizer(value.toString());            while(itr.hasMoreTokens()) {                word.set(itr.nextToken());                context.write(word, one);            }        }    }    public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {        IntWritable result = new IntWritable();        public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException,InterruptedException {            int sum = 0;            for(IntWritable val:values) {                sum += val.get();            }            result.set(sum);            context.write(key,result);        }    }    public static void main(String[] args) throws Exception {        int exitCode = ToolRunner.run(new WordCount(), args);        System.exit(exitCode);    }}

输出正常：

17/07/20 04:39:15 INFO client.RMProxy: Connecting to ResourceManager at master/10.211.55.5:803217/07/20 04:39:16 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.17/07/20 04:39:16 INFO input.FileInputFormat: Total input files to process : 117/07/20 04:39:16 INFO mapreduce.JobSubmitter: number of splits:117/07/20 04:39:16 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1500533815972_000217/07/20 04:39:16 INFO impl.YarnClientImpl: Submitted application application_1500533815972_000217/07/20 04:39:16 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1500533815972_0002/17/07/20 04:39:16 INFO mapreduce.Job: Running job: job_1500533815972_000217/07/20 04:39:22 INFO mapreduce.Job: Job job_1500533815972_0002 running in uber mode : false17/07/20 04:39:22 INFO mapreduce.Job:  map 0% reduce 0%17/07/20 04:39:28 INFO mapreduce.Job:  map 100% reduce 0%17/07/20 04:39:34 INFO mapreduce.Job:  map 100% reduce 100%17/07/20 04:39:34 INFO mapreduce.Job: Job job_1500533815972_0002 completed successfully17/07/20 04:39:34 INFO mapreduce.Job: Counters: 49File System CountersFILE: Number of bytes read=87FILE: Number of bytes written=272771FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=202HDFS: Number of bytes written=53HDFS: Number of read operations=6HDFS: Number of large read operations=0HDFS: Number of write operations=2Job Counters Launched map tasks=1Launched reduce tasks=1Data-local map tasks=1Total time spent by all maps in occupied slots (ms)=3192Total time spent by all reduces in occupied slots (ms)=2943Total time spent by all map tasks (ms)=3192Total time spent by all reduce tasks (ms)=2943Total vcore-milliseconds taken by all map tasks=3192Total vcore-milliseconds taken by all reduce tasks=2943Total megabyte-milliseconds taken by all map tasks=3268608Total megabyte-milliseconds taken by all reduce tasks=3013632Map-Reduce FrameworkMap input records=8Map output records=16Map output bytes=162Map output materialized bytes=87Input split bytes=99Combine input records=16Combine output records=7Reduce input groups=7Reduce shuffle bytes=87Reduce input records=7Reduce output records=7Spilled Records=14Shuffled Maps =1Failed Shuffles=0Merged Map outputs=1GC time elapsed (ms)=169CPU time spent (ms)=1120Physical memory (bytes) snapshot=298622976Virtual memory (bytes) snapshot=3772493824Total committed heap usage (bytes)=140972032Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format Counters Bytes Read=103File Output Format Counters Bytes Written=53

文本输出 part-r-00000：

gkn     1hadoop  6hdfs    2lzy     1mapreduce       2yarn    2zqh     2

阅读全文

0 0