hadoop2.7.3版本MapReduce例子

来源:互联网 发布:在家做淘宝客服怎么找 编辑:程序博客网 时间:2024/06/03 08:04

前言:每次学习新东西,或者私下兴趣大发搞点东西,总是会出现问题。我是强迫症患者,手机电脑的系统和软件都要升级到最新,不升不舒服!在学习一些技术的时候,按照视频也好,书本上,或者帖子也罢,总是会出现问题,教程的软件版本很少有新版本的,我的版本都是比较高的。这里记录一下我所遇到的问题,希望帮助同样患者朋友。


简要:这里简单的写个wordcount例子,不建议windows本地运行,要么linux环境,要么打jar包到集群运行。以免你Windows环境问题耽误你时间。

pom.xml


<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>hadoop</groupId>
<artifactId>hadoop</artifactId>
<packaging>war</packaging>
<version>0.0.1-SNAPSHOT</version>
<name>hadoop Maven Webapp</name>
<url>http://maven.apache.org</url>
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.3</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<finalName>hadoop</finalName>
</build>
</project>


Mapper类:


package com.mapreduce;


import java.io.IOException;


import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;


public class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable> {


@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {


String line = value.toString();


String[] words = StringUtils.split(line, " ");


for (String word : words) {


context.write(new Text(word), new LongWritable(1));


}


}


}


Reduce类:

package com.mapreduce;


import java.io.IOException;


import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;


public class WCReducer extends Reducer<Text, LongWritable, Text, LongWritable> {


@Override
protected void reduce(Text key, Iterable<LongWritable> values, Context context)
throws IOException, InterruptedException {

long count = 0;

for (LongWritable value : values) {
count += value.get();
}

context.write(key, new LongWritable(count));

}

}


标准的Runner:

package com.mapreduce;


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;


public class WCRunner extends Configured implements Tool {


public int run(String[] args) throws Exception {


Configuration conf = new Configuration();


Job job = Job.getInstance(conf);
job.setJarByClass(WCRunner.class);


job.setMapperClass(WCMapper.class);
job.setReducerClass(WCReducer.class);


job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);


job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);



job.setPartitionerClass(null);

FileInputFormat.setInputPaths(job, new Path(args[0])); 
FileOutputFormat.setOutputPath(job, new Path(args[1]));


return job.waitForCompletion(true) == true ? 0 : 1;
}


public static void main(String[] args) throws Exception {
WCRunner runner = new WCRunner();
runner.run(args);
}
}


打jar包,集群运行任务


hadoop jar wc.jar com.mapreduce.WCRunner /opt/ /opt/out


问题:org.apache.hadoop.mapreduce.lib.input.InvalidInputException


老版本的hadoop是可以直接用本机文件地址的,2.7.3报错hdfs文件系统没有找到文件路径,改成hdfs地址后,恢复正常,

在官网2.7.3我也没有找到有关变动说明 http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/releasenotes.html

或者说在2.7.3之前就已经有这个变化了。

在学习中不停的遇到问题是好事,就如你知道了99种不成功的办法。如果全都按照教程来,没有问题的话,那就太可怕了。

原创粉丝点击