HBase MapReduce 使用
来源:互联网 发布:12 fan club动作数据 编辑:程序博客网 时间:2024/05/19 09:15
项目中需要用MapReduce来读取或者写Hbase,这样可以节省大量开发时间。
Hbase本身的jar包里就有这样的API , 以下是我从官网上找的一些资料,和大家分享一下。
原文地址:http://hbase.apache.org/book/mapreduce.example.html
总体说明一下:TableMapper 主要是读hbase数据,TableReducer 主要是写hbase数据。可以结合一起用,也可以分开用。
(一) 读Hbase实例
public static class MyMapper extends TableMapper<Text, Text> { public void map(ImmutableBytesWritable row, Result value, Context context) throws InterruptedException, IOException { // process data for the row from the Result instance. }}
Configuration config = HBaseConfiguration.create();Job job = new Job(config, "ExampleRead");job.setJarByClass(MyReadJob.class); // class that contains mapperScan scan = new Scan();scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobsscan.setCacheBlocks(false); // don't set to true for MR jobs// set other scan attrs...TableMapReduceUtil.initTableMapperJob( tableName, // input HBase table name scan, // Scan instance to control CF and attribute selection MyMapper.class, // mapper null, // mapper output key null, // mapper output value job);job.setOutputFormatClass(NullOutputFormat.class); // because we aren't emitting anything from mapperboolean b = job.waitForCompletion(true);if (!b) { throw new IOException("error with job!");}
(二) 读写实例
public static class MyMapper extends TableMapper<ImmutableBytesWritable, Put> {public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {// this example is just copying the data from the source table... context.write(row, resultToPut(row,value)); } private static Put resultToPut(ImmutableBytesWritable key, Result result) throws IOException { Put put = new Put(key.get()); for (KeyValue kv : result.raw()) {put.add(kv);}return put; }}
Configuration config = HBaseConfiguration.create();Job job = new Job(config,"ExampleReadWrite");job.setJarByClass(MyReadWriteJob.class); // class that contains mapperScan scan = new Scan();scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobsscan.setCacheBlocks(false); // don't set to true for MR jobs// set other scan attrsTableMapReduceUtil.initTableMapperJob(sourceTable, // input tablescan, // Scan instance to control CF and attribute selectionMyMapper.class, // mapper classnull, // mapper output keynull, // mapper output valuejob);TableMapReduceUtil.initTableReducerJob(targetTable, // output tablenull, // reducer classjob);job.setNumReduceTasks(0);boolean b = job.waitForCompletion(true);if (!b) { throw new IOException("error with job!");}
(三) 做统计实例
public static class MyMapper extends TableMapper<Text, IntWritable> {private final IntWritable ONE = new IntWritable(1); private Text text = new Text(); public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException { String val = new String(value.getValue(Bytes.toBytes("cf"), Bytes.toBytes("attr1"))); text.set(val); // we can only emit Writables... context.write(text, ONE); }}
public static class MyTableReducer extends TableReducer<Text, IntWritable, ImmutableBytesWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int i = 0; for (IntWritable val : values) { i += val.get(); } Put put = new Put(Bytes.toBytes(key.toString())); put.add(Bytes.toBytes("cf"), Bytes.toBytes("count"), Bytes.toBytes(i)); context.write(null, put); }}
Configuration config = HBaseConfiguration.create();Job job = new Job(config,"ExampleSummary");job.setJarByClass(MySummaryJob.class); // class that contains mapper and reducerScan scan = new Scan();scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobsscan.setCacheBlocks(false); // don't set to true for MR jobs// set other scan attrsTableMapReduceUtil.initTableMapperJob(sourceTable, // input tablescan, // Scan instance to control CF and attribute selectionMyMapper.class, // mapper classText.class, // mapper output keyIntWritable.class, // mapper output valuejob);TableMapReduceUtil.initTableReducerJob(targetTable, // output tableMyTableReducer.class, // reducer classjob);job.setNumReduceTasks(1); // at least one, adjust as requiredboolean b = job.waitForCompletion(true);if (!b) {throw new IOException("error with job!");}
(四)混合实例,结果存在文件上
public static class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable> {public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {int i = 0;for (IntWritable val : values) {i += val.get();}context.write(key, new IntWritable(i));}}
Configuration config = HBaseConfiguration.create();Job job = new Job(config,"ExampleSummaryToFile");job.setJarByClass(MySummaryFileJob.class); // class that contains mapper and reducerScan scan = new Scan();scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobsscan.setCacheBlocks(false); // don't set to true for MR jobs// set other scan attrsTableMapReduceUtil.initTableMapperJob(sourceTable, // input tablescan, // Scan instance to control CF and attribute selectionMyMapper.class, // mapper classText.class, // mapper output keyIntWritable.class, // mapper output valuejob);job.setReducerClass(MyReducer.class); // reducer classjob.setNumReduceTasks(1); // at least one, adjust as requiredFileOutputFormat.setOutputPath(job, new Path("/tmp/mr/mySummaryFile")); // adjust directories as requiredboolean b = job.waitForCompletion(true);if (!b) {throw new IOException("error with job!");}
- HBase MapReduce 使用
- 使用mapreduce操作Hbase
- 使用MapReduce处理Hbase数据
- Hbase与mapreduce结合使用
- 【Hadoop】使用MapReduce操作HBase
- 在HBase里使用MapReduce例子
- Hadoop学习--HBase与MapReduce的使用
- Hadoop学习--HBase与MapReduce的使用
- MapReduce on HBase使用与集成
- 使用MapReduce结合HBase Filter过滤数据
- HBase常用功能和HBase+MapReduce使用总结
- hbase->Mapreduce->hbase
- HBase - MapReduce - 使用 MapReduce 批量操作 HBase 介绍 | 那伊抹微笑
- Hadoop MapReduce VS HBase MapReduce
- HBase中的MapReduce 使用多个Scan和多个表
- hadoop学习笔记之mapreduce中使用hbase
- 使用MapReduce读取SequenceFile文件,批量导入HBase
- 使用MapReduce写入数据到hbase的多个表中
- php学习笔记(二十三)PDO的预处理语句的查询以及结果集的使用
- js获取页面传递参数(完整版)
- 扩充Visual Studio 功能
- css html span 块状不换行
- Java、.net和C++,文科女生学习哪个好?
- HBase MapReduce 使用
- EurekaLog 6.1.0.4 Entprise 破解版
- android游戏开发——AndEngine游戏引擎(2)--------精灵sprite
- 两侧高度相互适应的CSS+div布局
- Clearing Corporation
- 总结过去10年的程序员生涯
- android 布局
- Design pattern学习笔记 – observer pattern
- 装饰模式