Hadoop MapReduce 在某一列上自连接(self join)
来源:互联网 发布:为什么淘宝没有电棒 编辑:程序博客网 时间:2024/05/22 13:35
package mapreduce; import java.util.List; import java.io.IOException; import java.util.ArrayList; import java.util.Iterator; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class Self_join { public static class Map extends Mapper<Object, Text, Text, Text>{ public void map(Object key,Text value, Context context) throws IOException,InterruptedException{ String line = value.toString(); String[] ss = line.split(" ", 2); context.write(new Text(ss[1]), new Text("left_"+ss[0])); context.write(new Text(ss[0]), new Text("right_"+ss[1])); } } public static class Reduce extends Reducer<Text, Text, Text, Text>{ private static int time =0; private static List<String> ch = new ArrayList<String>(); private static List<String> g = new ArrayList<String>(); public void reduce(Text key,Iterable<Text> values,Context context) throws IOException,InterruptedException{ if(time == 0){ context.write(new Text("grandchild"), new Text("grandparent")); time ++; } Iterator<Text> ite = values.iterator(); ch.clear(); g.clear(); while(ite.hasNext()){ String p = ite.next().toString(); if(p.startsWith("left_")){ ch.add(p.replaceFirst("^left_", "")); } if(p.startsWith("right_")){ g.add(p.replaceFirst("^right_", "")); } } Iterator<String> chi = ch.iterator(); Iterator<String> gi = g.iterator(); while(chi.hasNext()){ String c = chi.next(); while(gi.hasNext()){ context.write(new Text(c), new Text(gi.next())); } } } } public static void main(String[] args) throws Exception{ Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "Selfjoin"); job.setJarByClass(Self_join.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
0 0
- Hadoop MapReduce 在某一列上自连接(self join)
- SQL 自连接 (self join)
- MapReduce-Join中级优化-hadoop自带datajoin的解决方法
- hadoop mapreduce join
- hadoop MapReduce join
- hadoop MapReduce join
- hadoop MapReduce join
- OCP-1Z0-051 第125题 self-join(自连接)和self-join(子查询)
- Hadoop Mapreduce 连接(Join)之一:重分区连接(Repartition join)
- 如何在Hadoop上编写MapReduce程序
- 如何在Hadoop上编写MapReduce程序
- 如何在Hadoop上编写MapReduce程序
- 如何在Hadoop上编写MapReduce程序
- 在 Hadoop 上编写 MapReduce 程序
- 如何在Hadoop上编写MapReduce程序
- 如何在Hadoop上编写MapReduce程序
- Hadoop MapReduce之Join示例
- oracle 的自连接(self join)操作(转载的)
- 查找python第三方包各个版本的方法
- Android动画之ViewAnimation(TweenAnimation)视图动画(补间动画)
- 数组
- apache集成weblogic,Cannot open TEMP post file问题
- 翻转二叉树
- Hadoop MapReduce 在某一列上自连接(self join)
- [JZOJ5073]【GDOI2017第三轮模拟day1】影魔
- Neural Module Networks
- 配置伪分布式时的问题”JAVA_HOME is not set and could not be found“解决方法
- redis.conf 常用配置文件详解
- 数据库基础知识7
- 题目1154:Jungle Roads
- 欢迎使用CSDN-markdown编辑器
- 【一步一个脚印】Tomcat+MySQL为自己的APP打造服务器(3-3)Json数据交互