Hadoop完全分布式环境配置及 Word Count 程序运行
来源:互联网 发布:python appium 编辑:程序博客网 时间:2024/05/19 13:15
一、Hadoop 完全分布式环境配置
主要参考了以下两个链接,尝试配置了以 mac 作为一台 master, 两台 ubuntu 作为 slave,也尝试了一台 ubuntu 作为 master,两台 ubuntu 作为 slave
本机环境:
mac
parallels desktop 及其上3台 ubuntu 系统
配置有什么问题,可以相互讨论!
参考链接:http://blog.csdn.net/wk51920/article/details/51686038
http://www.w2bc.com/Article/19645
涉及命令:
javac -classpath /usr/hadoop/hadoop-2.8.0/share/hadoop/common/hadoop-common-2.8.0.jar:/usr/hadoop/hadoop-2.8.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.8.0.jar WordCount.java -d classesjar -cvf WordCount.jar *hadoop fs -rm -r /outputhadoop jar WordCount.jar WordCount /input/count.txt /output
二、Word Count 程序运行
原始代码:
import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.conf.Configured;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.Tool;import org.apache.hadoop.util.ToolRunner;import java.io.IOException;import java.util.StringTokenizer;/** * Created by zhuqiuhui on 2017/7/14. */public class WordCount extends Configured implements Tool { public int run(String[] args) throws Exception { Configuration conf = new Configuration(); if(args.length != 2) { System.err.println("Usage: wordcount <in> <out>"); System.exit(2); } Job job = new Job(conf, "wordcount"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); return job.waitForCompletion(true)?0:1; } public class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { IntWritable one = new IntWritable(1); Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException,InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while(itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException,InterruptedException { int sum = 0; for(IntWritable val:values) { sum += val.get(); } result.set(sum); context.write(key,result); } } public static void main(String[] args) throws Exception { int exitCode = ToolRunner.run(new WordCount(), args); System.exit(exitCode); }}
输入文本:
hadoop mapreduce hadoop yarnhadoop hdfshadoop mapreduce hadoop yarnhadoop hdfszqh gknlzy zqh
报错:
17/07/20 04:22:35 INFO client.RMProxy: Connecting to ResourceManager at master/10.211.55.5:803217/07/20 04:22:35 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.17/07/20 04:22:36 INFO input.FileInputFormat: Total input files to process : 117/07/20 04:22:36 INFO mapreduce.JobSubmitter: number of splits:117/07/20 04:22:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1500533815972_000117/07/20 04:22:37 INFO impl.YarnClientImpl: Submitted application application_1500533815972_000117/07/20 04:22:37 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1500533815972_0001/17/07/20 04:22:37 INFO mapreduce.Job: Running job: job_1500533815972_000117/07/20 04:22:46 INFO mapreduce.Job: Job job_1500533815972_0001 running in uber mode : false17/07/20 04:22:46 INFO mapreduce.Job: map 0% reduce 0%17/07/20 04:22:50 INFO mapreduce.Job: Task Id : attempt_1500533815972_0001_m_000000_0, Status : FAILEDError: java.lang.RuntimeException: java.lang.NoSuchMethodException: WordCount$TokenizerMapper.<init>()at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)Caused by: java.lang.NoSuchMethodException: WordCount$TokenizerMapper.<init>()at java.lang.Class.getConstructor0(Class.java:3082)at java.lang.Class.getDeclaredConstructor(Class.java:2178)at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128)... 7 more17/07/20 04:22:53 INFO mapreduce.Job: Task Id : attempt_1500533815972_0001_m_000000_1, Status : FAILEDError: java.lang.RuntimeException: java.lang.NoSuchMethodException: WordCount$TokenizerMapper.<init>()at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)Caused by: java.lang.NoSuchMethodException: WordCount$TokenizerMapper.<init>()at java.lang.Class.getConstructor0(Class.java:3082)at java.lang.Class.getDeclaredConstructor(Class.java:2178)at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128)... 7 more17/07/20 04:22:59 INFO mapreduce.Job: Task Id : attempt_1500533815972_0001_m_000000_2, Status : FAILEDError: java.lang.RuntimeException: java.lang.NoSuchMethodException: WordCount$TokenizerMapper.<init>()at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)Caused by: java.lang.NoSuchMethodException: WordCount$TokenizerMapper.<init>()at java.lang.Class.getConstructor0(Class.java:3082)at java.lang.Class.getDeclaredConstructor(Class.java:2178)at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128)... 7 more17/07/20 04:23:04 INFO mapreduce.Job: map 100% reduce 100%17/07/20 04:23:04 INFO mapreduce.Job: Job job_1500533815972_0001 failed with state FAILED due to: Task failed task_1500533815972_0001_m_000000Job failed as tasks failed. failedMaps:1 failedReduces:017/07/20 04:23:04 INFO mapreduce.Job: Counters: 13Job Counters Failed map tasks=4Killed reduce tasks=1Launched map tasks=4Other local map tasks=3Data-local map tasks=1Total time spent by all maps in occupied slots (ms)=11149Total time spent by all reduces in occupied slots (ms)=0Total time spent by all map tasks (ms)=11149Total time spent by all reduce tasks (ms)=0Total vcore-milliseconds taken by all map tasks=11149Total vcore-milliseconds taken by all reduce tasks=0Total megabyte-milliseconds taken by all map tasks=11416576Total megabyte-milliseconds taken by all reduce tasks=0
原因:
执行mapreduce出现的错,原因是map类和reduce没有加static修饰,因为Hadoop在调用map和reduce类时采用的反射调用,内部类不是静态的,没有获取到内部类的实例。
改后代码:
import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.conf.Configured;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.Tool;import org.apache.hadoop.util.ToolRunner;import java.io.IOException;import java.util.StringTokenizer;/** * Created by zhuqiuhui on 2017/7/14. */public class WordCount extends Configured implements Tool { public int run(String[] args) throws Exception { Configuration conf = new Configuration(); if(args.length != 2) { System.err.println("Usage: wordcount <in> <out>"); System.exit(2); } Job job = new Job(conf, "wordcount"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); return job.waitForCompletion(true)?0:1; } public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { IntWritable one = new IntWritable(1); Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException,InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while(itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException,InterruptedException { int sum = 0; for(IntWritable val:values) { sum += val.get(); } result.set(sum); context.write(key,result); } } public static void main(String[] args) throws Exception { int exitCode = ToolRunner.run(new WordCount(), args); System.exit(exitCode); }}
输出正常:
17/07/20 04:39:15 INFO client.RMProxy: Connecting to ResourceManager at master/10.211.55.5:803217/07/20 04:39:16 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.17/07/20 04:39:16 INFO input.FileInputFormat: Total input files to process : 117/07/20 04:39:16 INFO mapreduce.JobSubmitter: number of splits:117/07/20 04:39:16 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1500533815972_000217/07/20 04:39:16 INFO impl.YarnClientImpl: Submitted application application_1500533815972_000217/07/20 04:39:16 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1500533815972_0002/17/07/20 04:39:16 INFO mapreduce.Job: Running job: job_1500533815972_000217/07/20 04:39:22 INFO mapreduce.Job: Job job_1500533815972_0002 running in uber mode : false17/07/20 04:39:22 INFO mapreduce.Job: map 0% reduce 0%17/07/20 04:39:28 INFO mapreduce.Job: map 100% reduce 0%17/07/20 04:39:34 INFO mapreduce.Job: map 100% reduce 100%17/07/20 04:39:34 INFO mapreduce.Job: Job job_1500533815972_0002 completed successfully17/07/20 04:39:34 INFO mapreduce.Job: Counters: 49File System CountersFILE: Number of bytes read=87FILE: Number of bytes written=272771FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=202HDFS: Number of bytes written=53HDFS: Number of read operations=6HDFS: Number of large read operations=0HDFS: Number of write operations=2Job Counters Launched map tasks=1Launched reduce tasks=1Data-local map tasks=1Total time spent by all maps in occupied slots (ms)=3192Total time spent by all reduces in occupied slots (ms)=2943Total time spent by all map tasks (ms)=3192Total time spent by all reduce tasks (ms)=2943Total vcore-milliseconds taken by all map tasks=3192Total vcore-milliseconds taken by all reduce tasks=2943Total megabyte-milliseconds taken by all map tasks=3268608Total megabyte-milliseconds taken by all reduce tasks=3013632Map-Reduce FrameworkMap input records=8Map output records=16Map output bytes=162Map output materialized bytes=87Input split bytes=99Combine input records=16Combine output records=7Reduce input groups=7Reduce shuffle bytes=87Reduce input records=7Reduce output records=7Spilled Records=14Shuffled Maps =1Failed Shuffles=0Merged Map outputs=1GC time elapsed (ms)=169CPU time spent (ms)=1120Physical memory (bytes) snapshot=298622976Virtual memory (bytes) snapshot=3772493824Total committed heap usage (bytes)=140972032Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format Counters Bytes Read=103File Output Format Counters Bytes Written=53
文本输出 part-r-00000:
gkn 1hadoop 6hdfs 2lzy 1mapreduce 2yarn 2zqh 2
阅读全文
0 0
- Hadoop完全分布式环境配置及 Word Count 程序运行
- 搭建Hadoop伪分布式环境,及运行wordcount程序总结
- Hadoop 完全分布式安装及配置
- Hadoop 2.4 完全分布式环境安装与配置及配置信息介绍
- Hadoop完全分布式配置
- Hadoop完全分布式配置
- hadoop完全分布式配置
- hadoop完全分布式配置
- Hadoop完全分布式配置
- Hadoop完全分布式配置
- Hadoop完全分布式配置
- Hadoop完全分布式配置
- Hadoop完全分布式配置
- Hadoop完全分布式配置
- Hadoop完全分布式配置
- Hadoop配置完全分布式
- Hadoop 完全分布式配置
- 【hadoop学习】在伪分布式hadoop上手把手实践word count程序【上】
- Github收藏之awesome-android-ui
- oracle中的decode的使用
- CentOS 6上编译安装LAMP
- SQL字段的被包含查询
- CodeForces 197D 搜索
- Hadoop完全分布式环境配置及 Word Count 程序运行
- inflate()引发NullPointerException
- 2.Iris数据集:感知器模型的简单实战(分类)
- 搭建 Seafile 专属网盘
- WCF基本实现(Server和Client)
- SIFT特征提取分析
- 上手上路之 js 数据类型 number和parseInt..的代码笔记
- (82)IO概述、字符流写入、异常处理、续写、两种读取方式、读写练习
- scrapy爬虫返回403