lucene按某个字段排序代码解析

来源:互联网 发布:淘宝要不要开企业店铺 编辑:程序博客网 时间:2024/05/16 17:09

对于lucene来说,索引目录下有多个索引段,那么对应的每个索引段有相对应的reader.

这些reader它们之间是完全独立的,数据是独立的,搜索是独立的。

现在看一下按某个字段排序的收集器是如何做的





 private static class OneComparatorScoringMaxScoreCollector extends      OneComparatorNonScoringCollector {    Scorer scorer;        public OneComparatorScoringMaxScoreCollector(FieldValueHitQueue<Entry> queue,        int numHits, boolean fillFields) throws IOException {      super(queue, numHits, fillFields);      // Must set maxScore to NEG_INF, or otherwise Math.max always returns NaN.      maxScore = Float.NEGATIVE_INFINITY;    }        final void updateBottom(int doc, float score) {      bottom.doc = docBase + doc;      bottom.score = score;      bottom =  pq.updateTop();    }    @Override    public void collect(int doc) throws IOException {      final float score = scorer.score();      if (score > maxScore) {        maxScore = score;      }      ++totalHits;      if (queueFull) {        if ((reverseMul * comparator.compareBottom(doc)) <= 0) {          // since docs are visited in doc Id order, if compare is 0, it means          // this document is largest than anything else in the queue, and          // therefore not competitive.          return;        }                // This hit is competitive - replace bottom element in queue & adjustTop        comparator.copy(bottom.slot, doc);        updateBottom(doc, score);        comparator.setBottom(bottom.slot);      } else {        // Startup transient: queue hasn't gathered numHits yet        final int slot = totalHits - 1;        // Copy hit into queue        comparator.copy(slot, doc);        add(slot, doc, score);        if (queueFull) {          comparator.setBottom(bottom.slot);        }      }    }        @Override    public void setScorer(Scorer scorer) throws IOException {      this.scorer = scorer;      super.setScorer(scorer);    }  }



进入这个优先级队列的时候,当队列未满的时候可以直接拷贝在values数组中,并设置好一个bottom,当然队列满的时候只须跟bottom比较就可以了

小于bottom的直接不要。否则放入队列,并修改bottom的值。


我们先暂时使用数字字段的比较器IntComparator

IntComparator(int numHits, String field, FieldCache.Parser parser, Integer missingValue) {      super(field, missingValue);      values = new int[numHits];      this.parser = (IntParser) parser;    }

生成一个长度为numHits的数组values,保存放入堆里的值,优先级队列比较的时候是使用values来比较。但命中的doc可以放入优先级队列的时候,需要得到当前doc对应的字段的值。。如何获取呢:


@Override    public void setNextReader(IndexReader reader, int docBase) throws IOException {      // NOTE: must do this before calling super otherwise      // we compute the docsWithField Bits twice!      currentReaderValues = FieldCache.DEFAULT.getInts(reader, field, parser, missingValue != null);      super.setNextReader(reader, docBase);    }

这里可以从缓存中得到该reader下某字段的所有值放在currentReaderValues数组中。。

currentReaderValues[doc]直接得到该doc对应字段的值,并放在value数组中:

  public void copy(int slot, int doc) {      int v2 = currentReaderValues[doc];      // Test for v2 == 0 to save Bits.get method call for      // the common case (doc has value and value is non-zero):      if (docsWithField != null && v2 == 0 && !docsWithField.get(doc)) {        v2 = missingValue;      }      values[slot] = v2;    }








原创粉丝点击