Hive ORC数据格式的MapReduce读写
来源:互联网 发布:手机淘宝如何打造爆款 编辑:程序博客网 时间:2024/06/05 16:35
1,mr代码如下
package com.test.hadoop;import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.NullWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.orc.TypeDescription;import org.apache.orc.mapred.OrcStruct;import org.apache.orc.mapreduce.OrcInputFormat;import org.apache.orc.mapreduce.OrcOutputFormat;public class ORCSample {public static class ORCMapper extendsMapper<NullWritable, OrcStruct, Text, Text> {public void map(NullWritable key, OrcStruct value, Context output)throws IOException, InterruptedException {output.write((Text) value.getFieldValue(1),(Text) value.getFieldValue(2));}}public static class ORCReducer extendsReducer<Text, Text, NullWritable, OrcStruct> {private TypeDescription schema = TypeDescription.fromString("struct<name:string,mobile:string>");private OrcStruct pair = (OrcStruct) OrcStruct.createValue(schema);private final NullWritable nw = NullWritable.get();public void reduce(Text key, Iterable<Text> values, Context output)throws IOException, InterruptedException {for (Text val : values) {pair.setFieldValue(0, key);pair.setFieldValue(1, val);output.write(nw, pair);}}}public static void main(String args[]) throws Exception {Configuration conf = new Configuration();conf.set("orc.mapred.output.schema","struct<name:string,mobile:string>");Job job = Job.getInstance(conf, "ORC Test");job.setJarByClass(ORCSample.class);job.setMapperClass(ORCMapper.class);job.setReducerClass(ORCReducer.class);job.setInputFormatClass(OrcInputFormat.class);job.setOutputFormatClass(OrcOutputFormat.class);job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(Text.class);job.setOutputKeyClass(NullWritable.class);job.setOutputValueClass(OrcStruct.class);FileInputFormat.addInputPath(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));System.exit(job.waitForCompletion(true) ? 0 : 1);}}2,pom.xml中添加依赖(基于hadoop2.7.1)
<dependencies> <dependency> <groupId>org.apache.orc</groupId> <artifactId>orc-mapreduce</artifactId> <version>1.1.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-core</artifactId> <version>2.7.1</version> </dependency></dependencies>
3,创建表,在 t_test_orc中添加3行数据。
CREATE TABLE `t_test_orc`( `siteid` string, `name` string, `mobile` string) stored as orc
CREATE TABLE `t_test_orc_new`( `name` string, `mobile` string)ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'LOCATION 'hdfs://namenode:9000/user/testorc3'
4,打包运行
hadoop jar MRTest-1.0-jar-with-dependencies.jar com.test.hadoop.ORCSample /hive/warehouse/mytest.db/t_test_orc /user/testorc3
5,完成后可以用hive --orcfiledump -d 查看执行结果
并且进入hive 查询orc格式的 t_test_orc_new表也可以看到数据
更多信息可以参考https://orc.apache.org/
1 0
- Hive ORC数据格式的MapReduce读写
- Hive ORC数据格式的MapReduce Shuffle
- MapReduce读写orc文件
- MapReduce基础开发之十读写ORC File
- 【hive】hive的数据格式介绍
- hive的数据格式
- MapReduce 读取ORC格式文件
- Hive ORC和Parquet
- hive:orc file
- Hive 之 ORC
- hive表的存储格式; ORC格式的使用
- Hive:ORC与RC存储格式之间的区别
- hive ORC 文件存储格式
- Hive-ORC文件存储格式
- Hive Streaming 追加 ORC 文件
- Hive中不走MapReduce的查询
- Hive中不走MapReduce的查询
- hadoop-2.2.0配合hive-0.12.0使用orc存储引发的bug
- 初识IntPtr
- CGI和Servlet的对比
- 一起学Netty(二十)netty的比较规范的C/S端的写法
- 第一个JNI例子
- 编写类似strcmp() strncmp()函数
- Hive ORC数据格式的MapReduce读写
- 微服务应用-基于Spring Cloud和Docker构建电影推荐微服务
- Ant自动编译Java project时无法找到rt.jar的处理方法
- 显示Intent和隐式Intent的使用(意图)
- 虚拟机打包重新部署全过程详细说明
- mysql存储过程初级
- 热度TopN排名算法的设计
- JSP中<base href="<%=basePath%>">作用
- gitconfig中磨刀不误砍柴工的小配置