spatialhadoop2.3源码阅读(八) RTree索引生成方法(二)

来源:互联网 发布:qq音乐 知乎 编辑:程序博客网 时间:2024/05/22 09:50

这一章主要介绍MapReduce的具体实现。

1. Map

 /**   * The map class maps each object to the cell with maximum overlap.   * @author Ahmed Eldawy   *   */  public static class RepartitionMapNoReplication<T extends Shape> extends MapReduceBase      implements Mapper<Rectangle, T, IntWritable, T> {    /**List of cells used by the mapper*/    private CellInfo[] cellInfos;        /**Used to output intermediate records*/    private IntWritable cellId = new IntWritable();        @Override    public void configure(JobConf job) {      try {        cellInfos = SpatialSite.getCells(job);        super.configure(job);      } catch (IOException e) {        throw new RuntimeException("Error loading cells", e);      }    }        /**     * Map function     * @param dummy     * @param shape     * @param output     * @param reporter     * @throws IOException     */    public void map(Rectangle cellMbr, T shape,        OutputCollector<IntWritable, T> output, Reporter reporter)        throws IOException {      Rectangle shape_mbr = shape.getMBR();      if (shape_mbr == null)        return;      double maxOverlap = -1.0;      int bestCell = -1;      // Only send shape to output if its lowest corner lies in the cellMBR      // This ensures that a replicated shape in an already partitioned file      // doesn't get send to output from all partitions      if (!cellMbr.isValid() || cellMbr.contains(shape_mbr.x1, shape_mbr.y1)) {        for (int cellIndex = 0; cellIndex < cellInfos.length; cellIndex++) {          Rectangle overlap = cellInfos[cellIndex].getIntersection(shape_mbr);          if (overlap != null) {            double overlapArea = overlap.getWidth() * overlap.getHeight();            if (bestCell == -1 || overlapArea > maxOverlap) {              maxOverlap = overlapArea;              bestCell = cellIndex;            }          }        }      }      if (bestCell != -1) {        cellId.set((int) cellInfos[bestCell].cellId);        output.collect(cellId, shape);      } else {        LOG.warn("Shape: "+shape+" doesn't overlap any partitions");      }    }  }
Map类大致可以分为两部分:configure方法和map方法。

configure方法的主要功能是获得上一章所讲的CellInfo数组。

接下来重点介绍map方法。

35行:获得输入数据的最小包围矩形。

43行:与spatialhadoop2.3源码阅读(六) grid 索引生成方法(二)中相比,当前cellMbr的值为Rectangle: (NaN,0.0)-(0.0,0.0),所以if验证的前       半部分始终为true

44-51:遍历所有的网格,判断当前输入数据与哪一个网格的相交面积最大,则认为输入数据属于哪一个网格。

45:获得当前网格与输入数据的相交矩形,若不相交,则为null

47-50:若相交,与前一次的相交面积进行比较,若大于则更新。

55-57:将得出的当前输入数据所属的网格索引和输入数据一起输出。


2. Reduce

Reduce类与spatialhadoop2.3源码阅读(六) grid 索引生成方法(二)中的reduce完全相同,详情见该章。



3. outputCommiter

Commiter类与spatialhadoop2.3源码阅读(六) grid 索引生成方法(二)中的Commiter完全相同,详情见该章。

commiter的作用是将生成的所有包含_master的文件合并为同一个,生成_master.grid文件

0 0