关于mapreduce中类重用机制

来源:互联网 发布:学生成绩数据库下载 编辑:程序博客网 时间:2024/06/11 05:31

今天写了一个writable,其代码如下:

public class CFWritable implements Writable {    private IntWritable mark ;//标识位    private List<ItemWritable> items ;        public CFWritable(){    mark = new IntWritable(0);    items = new ArrayList<ItemWritable>(2);    }        public CFWritable(int mark,List<ItemWritable> items){        this.mark = new IntWritable(mark);    this.items = items ;    }    @Overridepublic void write(DataOutput out) throws IOException {out.writeInt(items.size());mark.write(out);for(ItemWritable item:items){item.write(out);}}@Overridepublic void readFields(DataInput in) throws IOException {int itemsSize = in.readInt();mark.readFields(in);for(int i = 0 ; i < itemsSize; i ++){ItemWritable item = new ItemWritable();item.readFields(in);items.add(item);}}public int getMark() {return mark.get();}public void setMark(int mark) {this.mark = new IntWritable(mark);}public List<ItemWritable> getItems() {return items;}public void setItems(List<ItemWritable> items) {this.items = items;}}

上面的代码在跑集群任务的时候,发现Reduce到66%这个数后就基本上不动了。排查一番,感觉类中的items的个数不会超过100个,那么在计算的时候不应该慢下来。为了验证想法,自己在程序中打印了一些信息,其中就包含items的size;打印出来的结果令我不解,items的size就是前面的累计。

仔细排查代码后,突然在脑中一闪:在ruduce的时候,mr为了加快速度(不要重新new)就复用了writable的类,而我这里却没有任何机制清空items,所以这里会一直在items 的后面添加数据。

问题找到后,修改代码如下:

public class CFWritable implements Writable {    private IntWritable mark ;//标识位    private List<ItemWritable> items ;        public CFWritable(){    mark = new IntWritable(0);    items = new ArrayList<ItemWritable>(2);    }        public CFWritable(int mark,List<ItemWritable> items){        this.mark = new IntWritable(mark);    this.items = items ;    }    @Overridepublic void write(DataOutput out) throws IOException {out.writeInt(items.size());mark.write(out);for(ItemWritable item:items){item.write(out);}}@Overridepublic void readFields(DataInput in) throws IOException {<span style="color:#ff0000;">clear();//先清除上次给的值</span>int itemsSize = in.readInt();mark.readFields(in);for(int i = 0 ; i < itemsSize; i ++){ItemWritable item = new ItemWritable();item.readFields(in);items.add(item);}}public void clear(){items.clear();}public int getMark() {return mark.get();}public void setMark(int mark) {this.mark = new IntWritable(mark);}public List<ItemWritable> getItems() {return items;}public void setItems(List<ItemWritable> items) {this.items = items;}}


0 0
原创粉丝点击