Hadoop MapReduce中的Partitioner

来源：互联网发布：sql建表语句主键编辑：程序博客网时间：2024/05/17 06:05

Partitioners负责划分Maper输出的中间键值对的key，分配中间键值对到不同的Reducer。Maper输出的中间结果交给指定的Partitioner，确保中间结果分发到指定的Reduce任务。

在每个Reducer中，键按排序顺序处理（Within each reducer, keys are processed in sorted order）。Combiners是MapReduce中的一个优化，允许在shuffle和排序阶段之前在本地进行聚合。Combiners的首要目标是通过最小化键值对的数量来节省尽可能多的带宽，键值对将通过网络在mappers和reducers之间进行shuffle操作（The primary goal of combiners is to save as much bandwidth as possible by minimizing the number of key/value pairs that will be shuffled across the network between mappers and reducers）。

我们可以把Combiners理解为发生在shuffle和sort阶段之前，对Mapper输出进行操作的"mini-reducers"。每个Combiner单独操作于一个Mapper，因此不能访问其他Mapper输出的中间结果。

1. Partitioner

第一次使用MapReduce程序的一个常见误解就是认为程序只使用一个reducer。毕竟，单个reducer在处理之前对所有数据进行排序，并将输出数据存储在单独一个输出文件中---谁不喜欢排序数据？（After all, a single reducer sorts all of your data before processing and would have stored output data in one single output file— and who doesn't like sorted data?）我们很容易理解这样的约束是毫无意义的，在大部分时间使用多个reducer是必需的，否则map / reduce理念将不在有用（It is easy to understand that such a constraint is a nonsense and that using more than one reducer is most of the time necessary, else the map/reduce concept would not be very useful）。

在使用多个Reducer的情况下，我们需要一些方法来决定Mapper输出的键值对发送到正确的Reducer中。默认情况是对key使用hash函数，来确定Reducer。

分区（partition）阶段发生在Map阶段之后，Reduce阶段之前。分区的数量等于Reducer的个数（The number of partitions is equal to the number of reducers）。根据分区函数，数据都会得到正确的分区，传递到Reducer中。这种方法提供了整体性能，允许mapper能够独立完成操作（allows mappers to operate completely independently）。对于所有输出的键值对，由每个mapper决定哪个reducer来接收它们。由于所有的Mapper都使用相同的分区函数，所以无论是由哪个Mapper产生，对相同的key来说，生成的目标分区都是一样的。

单个分区是指传递到单个reduce任务的所有键值对（A single partition refers to all key/value pairs that will be sent to a single reduce task）。可以通过设置job.setNumReduceTasks，来配置多个Reducer。Hadoop的附带了一个默认的分区实现，即HashPartitioner，对记录的key进行hash，来确定记录在所属的分区。每个分区由一个reduce任务处理，因此分区数等于作业的reduce任务数（Each partition is processed by a reduce task, so the number of partitions is equal to the number of reduce tasks for the job）。

当map函数开始产生输出时，并不总是简单的写到磁盘上。每个Map任务都有一个环形内存缓冲区，用户存储Map函数的输出。当缓冲区的内容达到一定的阈值大小时，后台线程将开始将内容溢出到磁盘。在溢写磁盘的过程中，map函数的输出会继续被写到缓冲区，但如果在此期间缓冲区被填满，map会阻塞直到写磁盘过程完成。在写磁盘之前，线程会根据数据最终要传入到的Reducer，把缓冲区的数据划分成（默认是按照键）分区（Before it writes to disk, the thread first divides the data into partitions corresponding to the reducers that they will ultimately be sent to）。在每个分区中，后台线程按照键进行内存排序，此时如果有一个Combiner，它会在排序后的输出上运行（Within each partition, the background thread performs an in-memory sort by key, and if there is a combiner function, it is run on the output of the sort）。

2. Combiner

Combiners：MapReduce作业受到集群可用带宽的限制，因此它可以最大限度地减少map和reduce任务之间传输的数据。Hadoop允许用户在Mapper输出上运行Combiner - Combiner函数的输出格式与reduce函数的输入一致。Combiner是可选的，是MapReduce的一个优化技巧，因此对于一个特定Mapper输出记录，Hadoop也不会知道Combiner会运行几次。换句话说，Combiner运行0次，一次或者多次，reducer都会产生相同的输出结果。Combiner多次运行，并不影响输出结果，运行Combiner的意义在于使map输出的中间结果更紧凑，使得写到本地磁盘和传给Reducer的数据更少。我们可以把Combiners理解为发生在shuffle和sort阶段之前，对Mapper输出进行操作的"mini-reducers"。每个Combiner单独操作于一个Mapper，因此不能访问其他Mapper输出的中间结果。Combiner提供与每个键相关联的键和值（与Mapper输出键和值相同的类型）（The combiner is provided keys and values associated with each key (the same types as the mapper output keys and values)）。关键的是，我们不能假设一个Combiner将有机会处理与同一个键相关联的所有值（Critically, one cannot assume that a combiner will have the opportunity to process all values associated with the same key），一个Combiner只能访问一个Mapper的中间输出结果，对于其他Mapper的中间输出结果无权访问。Combiner可以输出任意数量的键值对，但是键和值必须与Mapper输出具有相同的类型（与Reducer输入相同）。当一个操作满足结合律和交换律（例如，加法或乘法）的情况下，Reducer可以直接用作Combiner。然而，一般来说，Reducer和Combiner是不可互换的。

3. 区别

Partitioner与Combiner之间的区别在于，Partitioner根据Reducer的数量来划分数据，使得单个分区中的所有数据由单个Reducer执行。然而，Combiner功能类似于Reducer，处理每个分区中的数据。Combiner是Reducer的一种优化。根据key或者value的一些其他函数划分数据可能是有用处的，但是Combiner不一定会提供性能。你应该监视作业的行为，以查看Combiner输出的记录数是否明显小于记录数。你可以通过JobTracker Web UI轻松检查。

举个简单例子：

为了方便，假设我们有一个Employee表，数据如下。我们使用下面样例数据作为输入数据集来验证Partitioner是如何工作的。

IdNameAgeGenderSalary1201gopal45Male50,0001202manisha40Female50,0001203khalil34Male30,0001204prasanth30Male30,0001205kiran20Male40,0001206laxmi25Female35,0001207bhavya20Female15,0001208reshma19Female15,0001209kranthi22Male22,0001210Satish24Male25,0001211Krishna25Male25,0001212Arshad28Male20,0001213lavanya18Female8,000

我们写一个程序来处理输入数据集，对年龄进行分组（例如：小于20，21-30，大于30），并找到每个分组中的最高工资的员工。

2.1 Input Data

以上数据存储在/home/xiaosi/tmp/partitionerExample/input/目录中的input.txt文件中，数据存储格式如下：

1201gopal45Male50000
1202manisha40Female51000
1203khaleel34Male30000
1204prasanth30Male31000
1205kiran20Male40000
1206laxmi25Female35000
1207bhavya20Female15000
1208reshma19Female14000
1209kranthi22Male22000
1210Satish24Male25000
1211Krishna25Male26000
1212Arshad28Male20000
1213lavanya18Female8000

基于以上输入数据，下面是具体的算法描述。

2.2 Map Task

Map任务以键值对作为输入，我们存储文本数据在text文件中。Map 任务输入数据如下：

2.2.1 Input

key以"特殊key+文件名+行号"的模式表示，例如 key = @input1；value为一行中的数据，例如 value = 1201\tgopal\t45\tMale\t50000

2.2.2 Method

读取一行中数据，使用split方法以\t进行分割，取出性别存储在变量中

String[] str = value.toString().split("\t", -3);
String gender = str[3];

以性别为key，行记录数据为value作为输出键值对，从Map任务传递到Partition任务（Send the gender information and the record data value as output key-value pair from the map task to the partition task）：

context.write(new Text(gender), new Text(value));

对text文件中的所有记录重复以上所有步骤。

2.2.3 Output

得到性别与记录数据组成的键值对

2.3 Partition Task

Partition任务接受来自Map任务的键值对作为输入。 Partition（分区）意味着将数据分成段。根据给定分区条件规则，基于年龄标准将输入键值对数据划分为三部分。

2.3.1 Input

键值对集合中的所有数据。key = 记录中性别字段值； value = 该性别对应的完整记录数据

2.3.2 Method

从键值对数据中读取年龄字段值

String[] str = value.toString().split("\t");
int age = Integer.parseInt(str[2]);

根据如下规则校验age值

age 小于等于20

age 大于20 小于等于30

age 大于30

if (age <= 20) {
   return 0;
}
else if (age > 20 && age <= 30) {
   return 1 % numReduceTask;
}
else {
   return 2 % numReduceTask;
}

2.3.3 Output

键值对所有数据被分割成三个键值对集合。Reducer会处理每一个集合

2.4 Reduce Task

分区的数量等于Reduce任务的数量。这里我们有三个分区，因此我们有三个Reduce任务要执行。

2.4.1 Input

Reducer将使用不同的键值对集合执行三次。key = 记录中性别字段值； value = 该性别对应的完整记录数据

2.4.2 Method

读取记录数据中的Salary字段值

String[] str = value.toString().split("\t", -3);
int salary = Integer.parseInt(str[4]);

获取salary最大值

if (salary > max) {
   max = salary;
}

对于每个key集合（Male与Female为两个key集合）中的数据重复以上步骤。执行完这三个步骤之后，我们将会分别从女性集合中得到一个最高工资，从男性集合中得到一个最高工资。

context.write(new Text(key), new IntWritable(max));

2.4.3 Output

最后，我们将在不同年龄段的三个集合中获得一组键值对数据。它分别包含每个年龄段的男性集合的最高工资和每个年龄段的女性集合的最高工资。

执行Map，Partition和Reduce任务后，键值对数据的三个集合存储在三个不同的文件中作为输出。

3. 代码

package com.sjf.open.test;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.GzipCodec;
import org.apache.hadoop.mapred.JobPriority;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Partitioner;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import com.sjf.open.utils.FileSystemUtil;
public class PartitionerExample extends Configured implements Tool {
    public static void main(String[] args) throws Exception {
        int status = ToolRunner.run(new PartitionerExample(), args);
        System.exit(status);
    }
    private static class mapper extends Mapper<LongWritable, Text, Text, Text> {
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            try {
                String[] str = value.toString().split("\t", -3);
                String gender = str[3];
                context.write(new Text(gender), new Text(value));
            } catch (Exception e) {
                System.out.println(e.getMessage());
            }
        }
    }
    private static class reducer extends Reducer<Text, Text, Text, IntWritable> {
        private int max = Integer.MIN_VALUE;
        @Override
        protected void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException, InterruptedException {
            for (Text value : values) {
                String[] str = value.toString().split("\t", -3);
                int salary = Integer.parseInt(str[4]);
                if (salary > max) {
                    max = salary;
                }
            }
            context.write(new Text(key), new IntWritable(max));
        }
    }
    private static class partitioner extends Partitioner<Text, Text> {
        @Override
        public int getPartition(Text key, Text value, int numReduceTask) {
            System.out.println(key.toString() + "------" + value.toString());
            String[] str = value.toString().split("\t");
            int age = Integer.parseInt(str[2]);
            if (numReduceTask == 0) {
                return 0;
            }
            if (age <= 20) {
                return 0;
            }
            else if (age > 20 && age <= 30) {
                return 1 % numReduceTask;
            }
            else {
                return 2 % numReduceTask;
            }
        }
    }
    @Override
    public int run(String[] args) throws Exception {
        if (args.length != 2) {
            System.err.println("./run <input> <output>");
            System.exit(1);
        }
        String inputPath = args[0];
        String outputPath = args[1];
        int numReduceTasks = 3;
        Configuration conf = this.getConf();
        conf.set("mapred.job.queue.name", "test");
        conf.set("mapreduce.map.memory.mb", "1024");
        conf.set("mapreduce.reduce.memory.mb", "1024");
        conf.setBoolean("mapred.output.compress", true);
        conf.setClass("mapred.output.compression.codec", GzipCodec.class, CompressionCodec.class);
        Job job = Job.getInstance(conf);
        job.setJarByClass(PartitionerExample.class);
        job.setPartitionerClass(partitioner.class);
        job.setMapperClass(mapper.class);
        job.setReducerClass(reducer.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileSystem fileSystem = FileSystem.get(conf);
        fileSystem.delete(new Path(outputPath), true);
        FileSystemUtil.filterNoExistsFile(conf, job, inputPath);
        FileOutputFormat.setOutputPath(job, new Path(outputPath));
        job.setNumReduceTasks(numReduceTasks);
        boolean success = job.waitForCompletion(true);
        return success ? 0 : 1;
    }
}

4. 集群上执行

17/01/03 20:22:02 INFO mapreduce.Job: Running job: job_1472052053889_7059198
17/01/03 20:22:21 INFO mapreduce.Job: Job job_1472052053889_7059198 running in uber mode : false
17/01/03 20:22:21 INFO mapreduce.Job:  map 0% reduce 0%
17/01/03 20:22:37 INFO mapreduce.Job:  map 100% reduce 0%
17/01/03 20:22:55 INFO mapreduce.Job:  map 100% reduce 100%
17/01/03 20:22:55 INFO mapreduce.Job: Job job_1472052053889_7059198 completed successfully
17/01/03 20:22:56 INFO mapreduce.Job: Counters: 43
        File System Counters
                FILE: Number of bytes read=470
                FILE: Number of bytes written=346003
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=485
                HDFS: Number of bytes written=109
                HDFS: Number of read operations=12
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=6
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=3
                Rack-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=5559
                Total time spent by all reduces in occupied slots (ms)=164768
        Map-Reduce Framework
                Map input records=13
                Map output records=13
                Map output bytes=426
                Map output materialized bytes=470
                Input split bytes=134
                Combine input records=0
                Combine output records=0
                Reduce input groups=6
                Reduce shuffle bytes=470
                Reduce input records=13
                Reduce output records=6
                Spilled Records=26
                Shuffled Maps =3
                Failed Shuffles=0
                Merged Map outputs=3
                GC time elapsed (ms)=31
                CPU time spent (ms)=2740
                Physical memory (bytes) snapshot=1349193728
                Virtual memory (bytes) snapshot=29673148416
                Total committed heap usage (bytes)=6888620032
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=351
        File Output Format Counters 
                Bytes Written=109

阅读全文

0 0