Hadoop Partitioner组件
来源:互联网 发布:皇马4:1尤文 知乎 编辑:程序博客网 时间:2024/06/06 07:04
1、Partitioner组件可以让Map对Key进行分区,从而可以根据不同key来分发到不同的reduce中去处理。
2、你可以自定义key的一个分发规则,如数据文件包含不同的省份,而输出的要求是每个省份输出一个文件
3、提供了一个默认的HashPartitioner
在org.apache.hadoop.mapreduce.lib.partition.HashPartitioner.java
package org.apache.hadoop.mapreduce.lib.partition;import org.apache.hadoop.mapreduce.Partitioner;/** Partition keys by their {@link Object#hashCode()}. */public class HashPartitioner<K, V> extends Partitioner<K, V> { /** Use {@link Object#hashCode()} to partition. */ public int getPartition(K key, V value, int numReduceTasks) { return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks; }}
4、自定义Partitioner
1)继承抽象类Partitioner,实现自定义的getPartition()方法
2)通过job.setPartitionerClass()来设置自定义的Partitioner
在org.apache.hadoop.mapreduce.Partitioner.java中
package org.apache.hadoop.mapreduce;/** * Partitions the key space. * * <p><code>Partitioner</code> controls the partitioning of the keys of the * intermediate map-outputs. The key (or a subset of the key) is used to derive * the partition, typically by a hash function. The total number of partitions * is the same as the number of reduce tasks for the job. Hence this controls * which of the <code>m</code> reduce tasks the intermediate key (and hence the * record) is sent for reduction.</p> * * @see Reducer */public abstract class Partitioner<KEY, VALUE> { /** * Get the partition number for a given key (hence record) given the total * number of partitions i.e. number of reduce-tasks for the job. * * <p>Typically a hash function on a all or a subset of the key.</p> * * @param key the key to be partioned. * @param value the entry value. * @param numPartitions the total number of partitions. * @return the partition number for the <code>key</code>. */ public abstract int getPartition(KEY key, VALUE value, int numPartitions);}
Partitioner例子
Partitioner应用情景:
需求:分别统计每种商品的周销售情况
site1的周销售清单:
shoes 20
hat 10
stockings 30
clothes 40
site2的周销售清单:
shoes 15
hat 1
stockings 90
clothes 80
汇总结果:
shoes 35
hat 11
stockings 120
clothes 120
代码如下:
MyMapper.java
package com.partitioner;import java.io.IOException;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;public class MyMapper extends Mapper<LongWritable, Text, Text, IntWritable> { @Override protected void map(LongWritable key, Text value,Context context) throws IOException, InterruptedException { String[] s = value.toString().split("\\s+") ; context.write(new Text(s[0]), new IntWritable(Integer.parseInt(s[1]))) ; }}
MyPartitioner.java
package com.partitioner;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Partitioner;public class MyPartitioner extends Partitioner<Text,IntWritable>{ @Override public int getPartition(Text key, IntWritable value, int numPartitions) { if(key.toString().equals("shoes")){ return 0 ; } if(key.toString().equals("hat")){ return 1 ; } if(key.toString().equals("stockings")){ return 2 ; } return 3 ; }}
MyReducer.java
package com.partitioner;import java.io.IOException;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Reducer;public class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override protected void reduce(Text key, Iterable<IntWritable> value,Context context) throws IOException, InterruptedException { int sum = 0 ; for(IntWritable val : value ){ sum += val.get() ; } context.write(key, new IntWritable(sum)) ; }}
TestPartitioner.java
package com.partitioner;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.mapreduce.lib.reduce.IntSumReducer;import org.apache.hadoop.util.GenericOptionsParser;public class TestPartitioner { public static void main(String args[])throws Exception{ Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length != 2) { System.err.println("Usage: wordcount <in> <out>"); System.exit(2); } Job job = new Job(conf, "word count"); job.setJarByClass(TestPartitioner.class); job.setMapperClass(MyMapper.class);// job.setCombinerClass(MyCombiner.class); job.setReducerClass(MyReducer.class); job.setPartitionerClass(MyPartitioner.class) ; job.setNumReduceTasks(4) ; job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }}
0 0
- Hadoop Partitioner组件
- Hadoop组件之Partitioner
- Hadoop partitioner及自定义partitioner
- MapReduce 进阶:Partitioner 组件
- MapReduce 进阶:Partitioner 组件
- hadoop中的Partitioner分区
- hadoop中的Partitioner分区
- hadoop的partitioner
- hadoop Partitioner 分区
- hadoop中的Partitioner分区
- Hadoop的Partitioner
- Hadoop里的Partitioner
- Hadoop自定义分区Partitioner
- Hadoop中Partitioner解析
- 【hadoop】 4001-Partitioner编程
- Hadoop 中的 Partitioner 过程
- hadoop之partitioner编程
- 深入理解Hadoop Partitioner
- chef的自述
- 第十二周上机实践项目-阅读程序-1
- 解题报告:Codeforces Round #352 (Div. 2)
- SpringMVC中@RequestMapping6个基本用法小结
- mAppWidget - 6. 手绘(自定义)地图的移动和旋转
- Hadoop Partitioner组件
- Unix环境高级编程(阅读笔记)----可靠信号
- Kafka设计解析(五)- Kafka性能测试方法及Benchmark报告
- cacti结构浅析
- 缓存管理
- 2016"百度之星" - 测试赛(热身,陈题)1001,1002,1003,1004
- java中HashMap详解
- IOS和安卓ui设计常用尺寸及基本知识
- 第十二周项目 1.2实现复数类中的运算符重载