MapReduce自定义数据类型
来源:互联网 发布:北大青鸟知乎 编辑:程序博客网 时间:2024/04/30 00:36
实现WritableComparable接口的类大多数(在io包中的都可以)可以作为MapReduce中Mapper或Reducer的key-value数据类型。在hadoop框架中自带实现WritableComparable接口的类(FlowBean是自定义的)有:
可以看出,自带的类实现了对整形,浮点型,布尔型及String(Text类)的封装,都是比较简单的数据类型,在实际应用中通常需要自定义数据类型。在写自定义数据类之前首先分析一下自带的LongWritable数据类型,如下:
public class LongWritable implements WritableComparable<LongWritable> { private long value; public LongWritable() {} public LongWritable(long value) { set(value); } /** Set the value of this LongWritable. */ public void set(long value) { this.value = value; } /** Return the value of this LongWritable. */ public long get() { return value; } @Override public void readFields(DataInput in) throws IOException { value = in.readLong(); } @Override public void write(DataOutput out) throws IOException { out.writeLong(value); } /** Returns true iff <code>o</code> is a LongWritable with the same value. */ @Override public boolean equals(Object o) { if (!(o instanceof LongWritable)) return false; LongWritable other = (LongWritable)o; return this.value == other.value; } @Override public int hashCode() { return (int)value; } /** Compares two LongWritables. */ @Override public int compareTo(LongWritable o) { long thisValue = this.value; long thatValue = o.value; return (thisValue<thatValue ? -1 : (thisValue==thatValue ? 0 : 1)); } @Override public String toString() { return Long.toString(value); }}
LongWritable类主要包括一个私有类变量、该变量的get,set方法、一个无参构造函数和有参数构造函数、WritableComparable接口的readFields方法和write方法、Comparable接口的compareTo方法、同时重写了toString方法。
根据上述分析自定义的数据类型也应当包括这些方法,比如用于进行手机流量统计FlowBean。
package sempp.lsl.hadoop.mr.flowSum;import java.io.DataInput;import java.io.DataOutput;import java.io.IOException;import org.apache.hadoop.io.WritableComparable;public class FlowBean implements WritableComparable<FlowBean>{private String phone;private long uploadFlow;private long downloadFlow;private long sumFlow;public FlowBean(){//在反序列化时,反射机制需要调用空参构造函数}//为了初始化方便加入带参数构造函数public FlowBean(String phone, long uploadFlow, long downloadFlow) {this.phone = phone;this.uploadFlow = uploadFlow;this.downloadFlow = downloadFlow;this.sumFlow=uploadFlow+downloadFlow;}public String getPhone() {return phone;}public void setPhone(String phone) {this.phone = phone;}public long getUploadFlow() {return uploadFlow;}public void setUploadFlow(long uploadFlow) {this.uploadFlow = uploadFlow;}public long getDownloadFlow() {return downloadFlow;}public void setDownloadFlow(long downloadFlow) {this.downloadFlow = downloadFlow;}public long getsumFlow() {return sumFlow;}public void setsumFlow(long sumFlow) {this.sumFlow = sumFlow;}//序列化对象 将java对象转换为字节,并将字节写入二进制流中public void write(DataOutput out) throws IOException {out.writeUTF(phone);out.writeLong(uploadFlow);out.writeLong(downloadFlow);out.writeLong(sumFlow);}//从数据流中反序列化 从数据流中读出对象,必须跟序列化时的顺序保持一致public void readFields(DataInput in) throws IOException {phone = in.readUTF();uploadFlow=in.readLong();downloadFlow=in.readLong();sumFlow=in.readLong();}/** * 重写这个toString作为 输出 */@Overridepublic String toString(){return ""+phone+"\t"+uploadFlow+"\t"+downloadFlow+"\t"+sumFlow;}/** bean作为key时按照sumFlow进行排序 */ public int compareTo(FlowBean o) {return sumFlow>o.getsumFlow()?-1:1;}}
0 0
- MapReduce自定义数据类型
- MapReduce自定义数据类型
- MapReduce数据类型及自定义MapReduce数据类型
- MapReduce编程实践之自定义数据类型
- Hadoop实战【二、MapReduce+自定义数据类型】
- mapreduce学习笔记-二次排序(自定义数据类型,自定义分区分组)
- Hadoop读书笔记(六)MapReduce自定义数据类型demo
- MapReduce常用数据类型
- 使用自定义数据类型实现评论数时间、评论总数计数(mapreduce)
- 十一、理解MapReduce的二次排序功能,包括自定义数据类型、分区、分组、排序
- 在MapReduce的Map和Reduce过程中使用自定义数据类型
- Hadoop系列-MapReduce自定义数据类型(序列化、反序列化机制)(十二)
- 自定义数据类型
- 自定义数据类型
- 自定义数据类型
- 自定义数据类型
- 自定义数据类型
- 自定义数据类型
- 数据结构-线性表相关
- LeetCode:152_Maximum Product Subarray | 最大乘积连续子数组 | Medium
- 创建无边框的MFC单文档(SDI)应用程序
- String是最基本的数据类型吗?
- 关于Android Studio的AIDL的创建
- MapReduce自定义数据类型
- Linux自旋锁
- JDK里的设计模式
- 浅谈流形学习
- Service 工作流程
- zookeeper3.4.6集群部署
- 30多条mysql数据库优化方法,千万级数据库记录查询轻松解决
- pmtk3怎样离线安装
- int 和 Integer 有什么区别?