Lucene学习总结之七:Lucene搜索过程解析(4)
来源:互联网 发布:软件已过试用期 编辑:程序博客网 时间:2024/06/09 20:12
2.4、搜索查询对象
2.4.1.2、创建Weight对象树
BooleanQuery.createWeight(Searcher) 最终返回return new BooleanWeight(searcher),BooleanWeight构造函数的具体实现如下:
public BooleanWeight(Searcher searcher) {
this.similarity = getSimilarity(searcher);
weights = new ArrayList<Weight>(clauses.size());
//也是一个递归的过程,沿着新的Query对象树一直到叶子节点
for (int i = 0 ; i < clauses.size(); i++) {
weights.add(clauses.get(i).getQuery().createWeight(searcher));
}
}
对于TermQuery的叶子节点,其TermQuery.createWeight(Searcher) 返回return new TermWeight(searcher)对象,TermWeight构造函数如下:
public TermWeight(Searcher searcher) {
this.similarity = getSimilarity(searcher);
//此处计算了idf
idfExp = similarity.idfExplain(term, searcher);
idf = idfExp.getIdf();
}
//idf的计算完全符合文档中的公式:
public IDFExplanation idfExplain(final Term term, final Searcher searcher) {
final int df = searcher.docFreq(term);
final int max = searcher.maxDoc();
final float idf = idf(df, max);
return new IDFExplanation() {
public float getIdf() {
return idf;
}};
}
public float idf(int docFreq, int numDocs) {
return (float)(Math.log(numDocs/(double)(docFreq+1)) + 1.0);
}
而ConstantScoreQuery.createWeight(Searcher) 除了创建ConstantScoreQuery.ConstantWeight(searcher)对象外,没有计算idf。
由此创建的Weight对象树如下:
weight BooleanQuery$BooleanWeight (id=169)
| similarity DefaultSimilarity (id=177)
| this$0 BooleanQuery (id=89)
| weights ArrayList<E> (id=188)
| elementData Object[3] (id=190)
|------[0] BooleanQuery$BooleanWeight (id=171)
| | similarity DefaultSimilarity (id=177)
| | this$0 BooleanQuery (id=105)
| | weights ArrayList<E> (id=193)
| | elementData Object[2] (id=199)
| |------[0] ConstantScoreQuery$ConstantWeight (id=183)
| | queryNorm 0.0
| | queryWeight 0.0
| | similarity DefaultSimilarity (id=177)
| | //ConstantScore(contents:apple*)
| | this$0 ConstantScoreQuery (id=123)
| |------[1] TermQuery$TermWeight (id=175)
| idf 2.0986123
| idfExp Similarity$1 (id=241)
| queryNorm 0.0
| queryWeight 0.0
| similarity DefaultSimilarity (id=177)
| //contents:boy
| this$0 TermQuery (id=124)
| value 0.0
| modCount 2
| size 2
|------[1] BooleanQuery$BooleanWeight (id=179)
| | similarity DefaultSimilarity (id=177)
| | this$0 BooleanQuery (id=110)
| | weights ArrayList<E> (id=195)
| | elementData Object[2] (id=204)
| |------[0] ConstantScoreQuery$ConstantWeight (id=206)
| | queryNorm 0.0
| | queryWeight 0.0
| | similarity DefaultSimilarity (id=177)
| | //ConstantScore(contents:cat*)
| | this$0 ConstantScoreQuery (id=135)
| |------[1] TermQuery$TermWeight (id=207)
| idf 1.5389965
| idfExp Similarity$1 (id=210)
| queryNorm 0.0
| queryWeight 0.0
| similarity DefaultSimilarity (id=177)
| //contents:dog
| this$0 TermQuery (id=136)
| value 0.0
| modCount 2
| size 2
|------[2] BooleanQuery$BooleanWeight (id=182)
| similarity DefaultSimilarity (id=177)
| this$0 BooleanQuery (id=113)
| weights ArrayList<E> (id=197)
| elementData Object[2] (id=216)
|------[0] BooleanQuery$BooleanWeight (id=181)
| | similarity BooleanQuery$1 (id=220)
| | this$0 BooleanQuery (id=145)
| | weights ArrayList<E> (id=221)
| | elementData Object[2] (id=224)
| |------[0] TermQuery$TermWeight (id=226)
| | idf 2.0986123
| | idfExp Similarity$1 (id=229)
| | queryNorm 0.0
| | queryWeight 0.0
| | similarity DefaultSimilarity (id=177)
| | //contents:eat
| | this$0 TermQuery (id=150)
| | value 0.0
| |------[1] TermQuery$TermWeight (id=227)
| idf 1.1823215
| idfExp Similarity$1 (id=231)
| queryNorm 0.0
| queryWeight 0.0
| similarity DefaultSimilarity (id=177)
| //contents:cat^0.33333325
| this$0 TermQuery (id=151)
| value 0.0
| modCount 2
| size 2
|------[1] TermQuery$TermWeight (id=218)
idf 2.0986123
idfExp Similarity$1 (id=233)
queryNorm 0.0
queryWeight 0.0
similarity DefaultSimilarity (id=177)
//contents:foods
this$0 TermQuery (id=154)
value 0.0
modCount 2
size 2
modCount 3
size 3
2.4.1.3、计算Term Weight分数
(1) 首先计算sumOfSquaredWeights
按照公式:
代码如下:
float sum = weight.sumOfSquaredWeights();
//可以看出,也是一个递归的过程
public float sumOfSquaredWeights() throws IOException {
float sum = 0.0f;
for (int i = 0 ; i < weights.size(); i++) {
float s = weights.get(i).sumOfSquaredWeights();
if (!clauses.get(i).isProhibited())
sum += s;
}
sum *= getBoost() * getBoost(); //乘以query boost
return sum ;
}
对于叶子节点TermWeight来讲,其TermQuery$TermWeight.sumOfSquaredWeights()实现如下:
public float sumOfSquaredWeights() {
//计算一部分打分,idf*t.getBoost(),将来还会用到。
queryWeight = idf * getBoost();
//计算(idf*t.getBoost())^2
return queryWeight * queryWeight;
}
对于叶子节点ConstantWeight来讲,其ConstantScoreQuery$ConstantWeight.sumOfSquaredWeights() 如下:
public float sumOfSquaredWeights() {
//除了用户指定的boost以外,其他都不计算在打分内
queryWeight = getBoost();
return queryWeight * queryWeight;
}
(2) 计算queryNorm
其公式如下:
其代码如下:
public float queryNorm(float sumOfSquaredWeights) {
return (float)(1.0 / Math.sqrt(sumOfSquaredWeights));
}
(3) 将queryNorm算入打分
代码为:
weight.normalize(norm);
//又是一个递归的过程
public void normalize(float norm) {
norm *= getBoost();
for (Weight w : weights) {
w.normalize(norm);
}
}
其叶子节点TermWeight来讲,其TermQuery$TermWeight.normalize(float) 代码如下:
public void normalize(float queryNorm) {
this.queryNorm = queryNorm;
//原来queryWeight为idf*t.getBoost(),现在为queryNorm*idf*t.getBoost()。
queryWeight *= queryNorm;
//打分到此计算了queryNorm*idf*t.getBoost()*idf = queryNorm*idf^2*t.getBoost()部分。
value = queryWeight * idf;
}
我们知道,Lucene的打分公式整体如下,到此计算了图中,红色的部分:
- Lucene学习总结之七:Lucene搜索过程解析(4)
- Lucene学习总结之七:Lucene搜索过程解析(4)
- Lucene学习总结之七:Lucene搜索过程解析(4)
- Lucene学习总结之七:Lucene搜索过程解析(1)
- Lucene学习总结之七:Lucene搜索过程解析(2)
- Lucene学习总结之七:Lucene搜索过程解析(3)
- Lucene学习总结之七:Lucene搜索过程解析(5)
- Lucene学习总结之七:Lucene搜索过程解析(6)
- Lucene学习总结之七:Lucene搜索过程解析(1)
- Lucene学习总结之七:Lucene搜索过程解析(2)
- Lucene学习总结之七:Lucene搜索过程解析(3)
- Lucene学习总结之七:Lucene搜索过程解析(5)
- Lucene学习总结之七:Lucene搜索过程解析(6)
- Lucene学习总结之七:Lucene搜索过程解析(7)
- Lucene学习总结之七:Lucene搜索过程解析(8)
- Lucene学习总结之七:Lucene搜索过程解析(1)
- Lucene学习总结之七:Lucene搜索过程解析(2)
- Lucene学习总结之七:Lucene搜索过程解析(3)
- Lucene学习总结之七:Lucene搜索过程解析(2)
- 10个最酷的Linux单行命令
- Everonte Remember Everything。
- Android Intent机制实例详解
- Lucene学习总结之七:Lucene搜索过程解析(3)
- Lucene学习总结之七:Lucene搜索过程解析(4)
- MultiRow发现之旅(六)- 使用MultiRow开发票据应用(附源码)
- php和ruby
- MS SQL 实现分段计费统计
- Lucene学习总结之七:Lucene搜索过程解析(5)
- javascript,xml,xslt,html动态表单框架样例
- effective C++总结(转)
- How to develop applications with Qt Mobility based on Symbian device
- 如何加快R树构建的过程暨答复debogger