MapReduce初试
来源:互联网 发布:书生电子图书数据库 编辑:程序博客网 时间:2024/06/05 02:18
面试中问到mapreduce,所以装了单机版mapreduce,hadoop2,尝试一下。
工具
idea,maven,jdk8
Maven配置
<?xml version="1.0" encoding="UTF-8"?><project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>mitsuhide</groupId> <artifactId>javaAIA</artifactId> <version>1.0-SNAPSHOT</version> <properties> <hadoop.version>2.7.2</hadoop.version> </properties> <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>${hadoop.version}</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>${hadoop.version}</version> </dependency> </dependencies> <repositories> <repository> <id>apache</id> <url>http://maven.apache.org</url> </repository> </repositories></project>
注意hadoop2和hadoop1不一样,hadoop1是hadoop-core,这里用不到了。
配置输入输出
在web路径下,配置了input文件夹,output文件夹是mapreduce自动生成的,不用配置。
程序会读取input文件夹下所有文件,按行读取。
配置运行参数
就是在参数上加上input和output:
wordcount
package cn.bigdata.hadoop.mapreduce;import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;/** * Created by baidu on 16/9/29. */public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); System.out.println(conf.getStrings("mapreduce.framework.name")); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }}
运行
见output下的文本输出:
"Be 2"Don't 116, 120. 120; 124 160 280. 1All 1But 3...
运行完成!
0 0
- MapReduce初试
- hadoop MapReduce初试遇到的问题
- 初试
- 初试
- 初试
- 初试
- MapReduce
- MapReduce
- MapReduce
- MapReduce
- mapreduce
- MapReduce
- MapReduce
- MapReduce
- MapReduce
- mapreduce
- MapReduce
- MapReduce
- NTT FFT 数论变换 快速傅里叶变换 模板
- CSS盒子模型(简要了解)
- ORA-00904: “MAXSIZE”: invalid identifier
- 剑指offer:二进制中1的个数(java)
- Android OkHttp3 上传多张图片
- MapReduce初试
- PHP : MySQLi【面向过程】操作数据库【 连接、建库、建表、增、删、改、查、关闭】
- 悬挂指针与野指针
- 在mac上安装scrape怎么都不成功
- 几个内部排序算法的总结(JAVA版)
- Android 使用commons-net包进行FTP开发教程
- Unity3d 去掉exe版本的边框
- View的滑动及一个跟手滑动的效果
- influxdb的用户管理