lucene-2.9.0 索引过程(四) 合并过程
来源:互联网 发布:mac os x 10.11.3 iso 编辑:程序博客网 时间:2024/06/02 04:37
lucene-2.9.0
此版本使用的是对数合并策略
此前颁布的lucene是通过ducument的数量来驱动索引的合并的
使用策略为立即合并策略
例如合并因为mergeFactor
1.如果满足内存中的文档数为mergeFactor则触发内存索引写入磁盘
新增的segment文档数为mergeFactor
2.初始合并数为mergeDocs = mergeFactor
3.如果磁盘中有mergeFactor个segment, 每个segment有mergeDocs个文档数,
触发合并为mergeFactor*mergeDocs新segment
5.更新mergeDocs = mergeFactor*mergeDocs,如果满足条件3,递归合并直至无
可合并为止
6.调用optimize会将索引索引合并,此时不许满足mergeFactor条件
lucene-2.9.0 是使用内存驱动触发过程
意思是,设定内存索引大小(IndexWriter::DEFAULT_RAM_BUFFER_SIZE_MB),
当预定内存耗尽则
1. 触发内存索引写入磁盘
2. 触发可能的合并过程
3. 合并策略为对数
////////////////////////////////////////////
索引合并过程
合并过程是由独立一个线程完成
IndexWriter.addDocument(Document) line: 2428
IndexWriter.addDocument(Document, Analyzer) line: 2475
IndexWriter.flush(boolean, boolean, boolean) line: 4167
IndexWriter.maybeMerge() line: 2990
IndexWriter.maybeMerge(boolean) line: 2994
IndexWriter.maybeMerge(int, boolean) line: 2998
IndexWriter.updatePendingMerges(int, boolean) line: 3028
LogByteSizeMergePolicy(LogMergePolicy).findMerges(SegmentInfos) line: 444
// 有需要合并的集合
if (spec != null) {
final int numMerges = spec.merges.size();
for(int i=0;i<numMerges;i++)
registerMerge((MergePolicy.OneMerge) spec.merges.get(i));
}
另外一个进程依旧索引文档,而一个进程在做合并操作
SegmentMerger.merge(boolean) line: 153
IndexWriter.mergeMiddle(MergePolicy$OneMerge) line: 5012
IndexWriter.merge(MergePolicy$OneMerge) line: 4597
ConcurrentMergeScheduler.doMerge(MergePolicy$OneMerge) line: 235
ConcurrentMergeScheduler$MergeThread.run() line: 291
Thread [main] (Running) // 主线程依旧执行索引
// 如有合并过程则开辟一个新的线程
过程如下
ConcurrentMergeScheduler$MergeThread.run() line: 291
ConcurrentMergeScheduler.doMerge(MergePolicy$OneMerge) line: 235
IndexWriter.merge(MergePolicy$OneMerge) line: 4597
IndexWriter.mergeMiddle(MergePolicy$OneMerge) line: 5012
SegmentMerger.merge(boolean) line: 153
Thread [Lucene Merge Thread #0] (Suspended (breakpoint at line 153 in SegmentMerger))
/////////////////////////////////////////////////////////////////
此函数决定了合并段
给每个segment依旧segment计算一个level值
取level值在相同值(相同区间值内)的segment合并,且segment超过mergeFactor数
public MergeSpecification findMerges(SegmentInfos infos) throws IOException {
final int numSegments = infos.size();
// Compute levels, which is just log (base mergeFactor)
// of the size of each segment
float[] levels = new float[numSegments];
final float norm = (float) Math.log(mergeFactor);
for(int i=0;i<numSegments;i++) {
final SegmentInfo info = infos.info(i);
long size = size(info);
//根据segment的文件(_cfs/fdt/fdx文件)大小计算level
// Floor tiny segments
if (size < 1)
size = 1;
levels[i] = (float) Math.log(size)/norm;
}
final float levelFloor;
if (minMergeSize <= 0) // (minMergeSize 预设值
levelFloor = (float) 0.0;
else
levelFloor = (float) (Math.log(minMergeSize)/norm);
// Now, we quantize the log values into levels. The
// first level is any segment whose log size is within
// LEVEL_LOG_SPAN of the max size, or, who has such as
// segment "to the right". Then, we find the max of all
// other segments and use that to define the next level
// segment, etc.
MergeSpecification spec = null;
int start = 0;
// 遍历所有的segment ,合并相同level的segment
while(start < numSegments) {
// Find max level of all segments not already
// quantized.
float maxLevel = levels[start];
// 有一个细节,合并成新的segment,原有的子segment会删除,因此新的segment会在较小的下标
// 类似FIFO的栈
for(int i=1+start;i<numSegments;i++) {
final float level = levels[i];
if (level > maxLevel)
maxLevel = level;
}
// Now search backwards for the rightmost segment that
// falls into this level:
float levelBottom;
if (maxLevel < levelFloor) // 平均level
// All remaining segments fall into the min level
levelBottom = -1.0F;
else {
levelBottom = (float) (maxLevel - LEVEL_LOG_SPAN); // LEVEL_LOG_SPAN = 0.75 ,最大level下调0.75以形成合并level区间
// Force a boundary at the level floor
if (levelBottom < levelFloor && maxLevel >= levelFloor)
levelBottom = levelFloor;
}
int upto = numSegments-1;
// 确定合并区间的上确界
while(upto >= start) {
if (levels[upto] >= levelBottom) {
break;
}
upto--;
}
if (verbose())
message(" level " + levelBottom + " to " + maxLevel + ": " + (1+upto-start) + " segments");
int loop = 0 ;
// Finally, record all merges that are viable at this level:
int end = start + mergeFactor;
// mergeFactor是合并系数
// 区间有需要合并的segment
while(end <= 1+upto) {
boolean anyTooLarge = false;
for(int i=start;i<end;i++)
{
final SegmentInfo info = infos.info(i);
anyTooLarge |= (size(info) >= maxMergeSize || sizeDocs(info) >= maxMergeDocs);
}
if (!anyTooLarge) {
if (spec == null)
spec = new MergeSpecification();
if (verbose())
message(" " + start + " to " + end + ": add this merge");
//for(int i=start;i<end;i++)
//{
// final SegmentInfo info = infos.info(i);
// System.out.println("the " + i + ":" + "segment name :" + info.name + " docCount = " + info.docCount);
//}
System.out.println("loop = " + loop + " adding: start = " + start + " end = " + end);
spec.add(new OneMerge(infos.range(start, end), useCompoundFile));
} else if (verbose())
message(" " + start + " to " + end + ": contains segment over maxMergeSize or maxMergeDocs; skipping");
start = end;
end = start + mergeFactor;
}
start = 1+upto;
}
return spec;
}
关于索引合并算法还有其他一些,例如几何合并、动态哈夫曼(firtex中有实现)等。
较后再总结一下,加上实验结果
- lucene-2.9.0 索引过程(四) 合并过程
- lucene-2.9.0 索引过程(一) TermsHashPerField
- lucene-2.9.0 索引过程(二) FreqProxTermsWriter
- lucene-2.9.0 索引过程(三) 过程简述
- Lucene学习总结之四:Lucene索引过程分析(1)
- Lucene学习总结之四:Lucene索引过程分析(2)
- Lucene学习总结之四:Lucene索引过程分析(3)
- Lucene学习总结之四:Lucene索引过程分析(4)
- Lucene学习总结之四:Lucene索引过程分析(1)
- Lucene学习总结之四:Lucene索引过程分析(2)
- Lucene学习总结之四:Lucene索引过程分析(3)
- Lucene学习总结之四:Lucene索引过程分析(4)
- Lucene学习总结之四:Lucene索引过程分析(1)
- Lucene学习总结之四:Lucene索引过程分析(2)
- Lucene学习总结之四:Lucene索引过程分析(3)
- Lucene学习总结之四:Lucene索引过程分析(4)
- Lucene学习总结之四:Lucene索引过程分析(1)
- Lucene学习总结之四:Lucene索引过程分析(1)
- VB与C#默认添加控件访问权限不同
- struts2 引入javabean
- 文件格式说明大全--http://www.wotsit.org/
- C# 中对日期的处理 全部收录
- 使用squid代理时出现“The requested URL could not be retrieved”
- lucene-2.9.0 索引过程(四) 合并过程
- 常访问的网址网站
- 一个剖析AJAX原理的简单范例
- TabEsay!简单的,好用的,扩展性很好,的,选项卡
- JAVA文件操作
- plsql 查询中文显示乱码解决方法
- ASP.NET - 使用 ASP.NET Web 服务器控件(二)
- Mvc模式
- 淘宝API代码c#实例