倒排索引实现
来源:互联网 发布:淘宝店铺刷信誉多少钱 编辑:程序博客网 时间:2024/06/04 17:58
简介:参考文章:
输入输出:
输入:
a.txt:
hadoop hello world
hello test
test hadoop
b.txt:
test world
hello world test
hadoop
输出:
hadoop hdfs://h2/apps/ca/yanh/data/invertedIndex/a.txt-->2 hdfs://h2/apps/ca/yanh/data/invertedIndex/b.txt-->1
hello hdfs://h2/apps/ca/yanh/data/invertedIndex/b.txt-->1 hdfs://h2/apps/ca/yanh/data/invertedIndex/a.txt-->2
test hdfs://h2/apps/ca/yanh/data/invertedIndex/a.txt-->2 hdfs://h2/apps/ca/yanh/data/invertedIndex/b.txt-->2
world hdfs://h2/apps/ca/yanh/data/invertedIndex/b.txt-->2 hdfs://h2/apps/ca/yanh/data/invertedIndex/a.txt-->1
实现代码:
<span style="font-size:12px;">import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.input.FileSplit;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import java.io.IOException;/** * Created by 浩 on 2015/4/20. */public class InvertedIndex { public static void main(String[] args)throws IOException,InterruptedException, ClassNotFoundException { Configuration conf = new Configuration(); Job job = Job.getInstance(conf); job.setJarByClass(InvertedIndex.class); job.setMapperClass(InvertedMapper.class); FileInputFormat.addInputPath(job, new Path(args[0])); job.setCombinerClass(InvertedCombiner.class); job.setReducerClass(InvertedReducer.class); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); System.exit(job.waitForCompletion(true)? 0: 1); }}class InvertedMapper extends Mapper<LongWritable, Text, Text, Text> { private Text k = new Text(); private Text v = new Text(); @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String[] words = line.split(" "); FileSplit inputSplit = (FileSplit) context.getInputSplit(); Path path = inputSplit.getPath(); for (String word: words) { k.set(word + "-->" + path.toString()); v.set("1"); context.write(k, v); } }}class InvertedCombiner extends Reducer<Text, Text, Text, Text> { private Text k = new Text(); private Text v = new Text(); @Override protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException { String[] fields = key.toString().split("-->"); int sum = 0; for(Text val: values) { sum += Integer.parseInt(val.toString()); } k.set(fields[0]); v.set(fields[1] + "-->" + sum); context.write(k, v); }}class InvertedReducer extends Reducer<Text, Text, Text, Text> { private Text k = new Text(); private Text v = new Text(); @Override protected void reduce(Text key, Iterable<Text> values, Context context)throws IOException, InterruptedException { String str = ""; for(Text val: values) { str += val.toString() + "\t"; } k.set(key); v.set(str); context.write(k, v); }}</span>
0 0
- 倒排索引实现
- 实现倒排索引
- mapreduce实现倒排索引
- MapReduce实现倒排索引
- mapreduce实现倒排索引
- hadoop实现倒排索引
- Python 实现倒排索引
- 倒排索引C++实现
- MapReduce倒排索引实现
- 倒排索引 mr实现
- MapReduce实现倒排索引
- C++ 倒排索引的实现
- C++ 倒排索引的实现
- hadoop实现简单的倒排索引
- MapReduce倒排索引简单实现
- C++ 倒排索引的实现
- 使用Hadoop 实现文档倒排索引
- Hadoop 文档倒排索引实现
- ns2中程序未执行完无trace文件探究
- MySQL 获取连续范围
- MFC调试的几个技巧
- 信管14:类的继承示代码1
- 实现类似iPhone的Notification-在Service中添加悬浮窗
- 倒排索引实现
- Supervisor Run Program as Non-root
- 常用svn指令总结
- BAT 修改系统环境变量
- realloc 使用例子
- 如何进行产品促销研究
- [转] android自定义布局中的平滑移动
- JSP中获取Session
- lua判断一个元素是否在表里