hadoop2.7.3版本MapReduce例子
来源:互联网 发布:在家做淘宝客服怎么找 编辑:程序博客网 时间:2024/06/03 08:04
前言:每次学习新东西,或者私下兴趣大发搞点东西,总是会出现问题。我是强迫症患者,手机电脑的系统和软件都要升级到最新,不升不舒服!在学习一些技术的时候,按照视频也好,书本上,或者帖子也罢,总是会出现问题,教程的软件版本很少有新版本的,我的版本都是比较高的。这里记录一下我所遇到的问题,希望帮助同样患者朋友。
简要:这里简单的写个wordcount例子,不建议windows本地运行,要么linux环境,要么打jar包到集群运行。以免你Windows环境问题耽误你时间。
pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>hadoop</groupId>
<artifactId>hadoop</artifactId>
<packaging>war</packaging>
<version>0.0.1-SNAPSHOT</version>
<name>hadoop Maven Webapp</name>
<url>http://maven.apache.org</url>
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.3</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<finalName>hadoop</finalName>
</build>
</project>
Mapper类:
package com.mapreduce;
import java.io.IOException;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] words = StringUtils.split(line, " ");
for (String word : words) {
context.write(new Text(word), new LongWritable(1));
}
}
}
Reduce类:
package com.mapreduce;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WCReducer extends Reducer<Text, LongWritable, Text, LongWritable> {
@Override
protected void reduce(Text key, Iterable<LongWritable> values, Context context)
throws IOException, InterruptedException {
long count = 0;
for (LongWritable value : values) {
count += value.get();
}
context.write(key, new LongWritable(count));
}
}
标准的Runner:
package com.mapreduce;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
public class WCRunner extends Configured implements Tool {
public int run(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
job.setJarByClass(WCRunner.class);
job.setMapperClass(WCMapper.class);
job.setReducerClass(WCReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
job.setPartitionerClass(null);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job.waitForCompletion(true) == true ? 0 : 1;
}
public static void main(String[] args) throws Exception {
WCRunner runner = new WCRunner();
runner.run(args);
}
}
打jar包,集群运行任务
hadoop jar wc.jar com.mapreduce.WCRunner /opt/ /opt/out
问题:org.apache.hadoop.mapreduce.lib.input.InvalidInputException
老版本的hadoop是可以直接用本机文件地址的,2.7.3报错hdfs文件系统没有找到文件路径,改成hdfs地址后,恢复正常,
在官网2.7.3我也没有找到有关变动说明 http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/releasenotes.html
或者说在2.7.3之前就已经有这个变化了。
在学习中不停的遇到问题是好事,就如你知道了99种不成功的办法。如果全都按照教程来,没有问题的话,那就太可怕了。
- hadoop2.7.3版本MapReduce例子
- 【mapreduce】 Hadoop2.6.0 mapreduce 例子
- Hadoop2.2.0 mapreduce 例子
- Hadoop2.6.0 mapreduce 例子
- 新装的hadoop2版本无法运行mapreduce的解决方法
- Hadoop2.7.3 mapreduce(五)详解
- Hadoop2-MapReduce(1)
- Hadoop2-MapReduce(2)
- hadoop2.x MapReduce过程
- hadoop2.x MapReduce过程
- MapReduce读取Hbase中多个版本的数据,统计例子。
- MapReduce 例子
- mapreduce例子
- MapReduce中wordcount详细介绍(包括Hadoop1和Hadoop2版本)
- Hadoop2.7.3 mapreduce(一)原理及"hello world"实例
- 使用Eclipse编译运行MapReduce程序 Hadoop2.7.3
- Hadoop2.7.3 mapreduce(四)倒排索引的实现
- 使用命令行编译打包运行MapReduce程序 Hadoop2.7.3
- 01期:web浏览器事件传播机制(捕获和冒泡)
- 深度学习框架太抽象?其实不外乎这五大核心组件
- 根据图片链接规律爬取图片并下载
- G
- E
- hadoop2.7.3版本MapReduce例子
- SpringMVC Controller介绍
- ios-ARC和垃圾回收机制的区别
- 我们在路上
- 141. Linked List Cycle
- [16]质量控制工具 因果图-帕累托图-直方图-趋势图等
- Finding a string length
- A
- jQuery基础(1)