MapReduce编程(五) 单表关联

来源:互联网 发布:大华监控软件说明书 编辑:程序博客网 时间:2024/05/22 22:22

一、问题描述

下面给出一个child-parent的表格,要求挖掘其中的父子辈关系,给出祖孙辈关系的表格。

输入文件内容如下:

child    parentSteven   LucySteven   JackJone     LucyJone     JackLucy     MaryLucy     FrankJack     AliceJack     JesseDavid    AliceDavid    JessePhilip   DavidPhilip   AlmaMark     DavidMark     Alma

根据父辈和子辈挖掘爷孙关系。比如:

Steven   JackJack     AliceJack     Jesse

根据这三条记录,可以得出Jack是Steven的长辈,而Alice和Jesse是Jack的长辈,很显然Steven是Alice和Jesse的孙子。挖掘出的结果如下:

grandson    grandparentSteven      JesseSteven      Alice

要求通过MapReduce挖掘出所有的爷孙关系。

二、分析

解决这个问题要用到一个小技巧,就是单表关联。具体实现步骤如下,Map阶段每一行的key-value输入,同时也把value-key输入。以其中的两行为例:

Steven   JackJack     Alice

key-value和value-key都输入,变成4行:

Steven   JackJack     AliceJack     Steven  Alice    Jack

shuffle以后,Jack作为key值,起到承上启下的桥梁作用,Jack对应的values包含Alice、Steven,这时候Alice和Steven肯定是爷孙关系。为了标记哪些是孙子辈,哪些是爷爷辈,可以在Map阶段加上前缀,比如小辈加上前缀”-“,长辈加上前缀”+”。加上前缀以后,在Reduce阶段就可以根据前缀进行分类。

三、MapReduce程序

package com.javacore.hadoop;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import java.io.IOException;import java.util.ArrayList;import java.util.StringTokenizer;/** * Created by bee on 3/29/17. */public class RelationShip {    public static class RsMapper extends Mapper<Object, Text, Text, Text> {        private static int linenum = 0;        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {            String line = value.toString();            if (linenum == 0) {                ++linenum;            } else {                StringTokenizer tokenizer = new StringTokenizer(line, "\n");                while (tokenizer.hasMoreElements()) {                    StringTokenizer lineTokenizer = new StringTokenizer(tokenizer.nextToken());                    String son = lineTokenizer.nextToken();                    String parent = lineTokenizer.nextToken();                    context.write(new Text(parent), new Text(                            "-" + son));                    context.write(new Text(son), new Text                            ("+" + parent));                }            }        }    }    public static class RsReducer extends Reducer<Text, Text, Text, Text> {        private static int linenum = 0;        public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {            if (linenum == 0) {                context.write(new Text("grandson"), new Text("grandparent"));                ++linenum;            }            ArrayList<Text> grandChild = new ArrayList<Text>();            ArrayList<Text> grandParent = new ArrayList<Text>();            for (Text val : values) {                String s = val.toString();                if (s.startsWith("-")) {                    grandChild.add(new Text(s.substring(1)));                } else {                    grandParent.add(new Text(s.substring(1)));                }            }            for (Text text1 : grandChild) {                for (Text text2 : grandParent) {                    context.write(text1, text2);                }            }        }    }    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {        FileUtil.deleteDir("output");        Configuration cong = new Configuration();        String[] otherArgs = new String[]{"input/relations/table.txt",                "output"};        if (otherArgs.length != 2) {            System.out.println("参数错误");            System.exit(2);        }        Job job = Job.getInstance();        job.setJarByClass(RelationShip.class);        job.setMapperClass(RsMapper.class);        job.setReducerClass(RsReducer.class);        job.setOutputKeyClass(Text.class);        job.setOutputValueClass(Text.class);        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));        System.exit(job.waitForCompletion(true) ? 0 : 1);    }}

四、输出结果

grandson    grandparentMark    JesseMark    AlicePhilip  JessePhilip  AliceJone    JesseJone    AliceSteven  JesseSteven  AliceSteven  FrankSteven  MaryJone    FrankJone    Mary
1 0
原创粉丝点击