lucene源码分析---2
来源:互联网 发布:php curl 输出图片 编辑:程序博客网 时间:2024/05/02 04:13
lucene源码分析—lucene创建索引之准备工作
为了方便分析,这里再贴一次在上一章中lucene关于建立索引的实例的源代码,
String filePath = ...//文件路径 String indexPath = ...//索引路径 File fileDir = new File(filePath); Directory dir = FSDirectory.open(Paths.get(indexPath)); Analyzer luceneAnalyzer = new SimpleAnalyzer(); IndexWriterConfig iwc = new IndexWriterConfig(luceneAnalyzer); iwc.setOpenMode(OpenMode.CREATE); IndexWriter indexWriter = new IndexWriter(dir,iwc); File[] textFiles = fileDir.listFiles(); for (int i = 0; i < textFiles.length; i++) { if (textFiles[i].isFile()) { String temp = FileReaderAll(textFiles[i].getCanonicalPath(), "GBK"); Document document = new Document(); Field FieldPath = new StringField("path", textFiles[i].getPath(), Field.Store.YES); Field FieldBody = new TextField("body", temp, Field.Store.YES); document.add(FieldPath); document.add(FieldBody); indexWriter.addDocument(document); } } indexWriter.close();
首先,FSDirectory的open函数用来打开索引文件夹,用来存放后面生成的索引文件,代码如下,
public static FSDirectory open(Path path) throws IOException { return open(path, FSLockFactory.getDefault()); } public static FSDirectory open(Path path, LockFactory lockFactory) throws IOException { if (Constants.JRE_IS_64BIT && MMapDirectory.UNMAP_SUPPORTED) { return new MMapDirectory(path, lockFactory); } else if (Constants.WINDOWS) { return new SimpleFSDirectory(path, lockFactory); } else { return new NIOFSDirectory(path, lockFactory); } }
FSLockFactory获得的默认LockFactory是NativeFSLockFactory,该工厂可以获得文件锁NativeFSLock,后面如果分析到再来细看这方面代码。这里假设FSDirectory的open函数创建了一个NIOFSDirectory,NIOFSDirectory继承自FSDirectory,并且直接调用了其父类FSDirectory的构造函数,
protected FSDirectory(Path path, LockFactory lockFactory) throws IOException { super(lockFactory); if (!Files.isDirectory(path)) { Files.createDirectories(path); } directory = path.toRealPath(); }
FSDirectory的构造函数根据Path创建了一个目录或者文件,并且保存了对应的路径。FSDirectory继承自BaseDirectory,其构造函数只是简单保存了LockFactory,这里就不要往下看了。
回到最上面的例子中,接下来构造了SimpleAnalyzer,然后根据构造的SimpleAnalyzer创建一个IndexWriterConfig,其构造函数直接调用了其父类LiveIndexWriterConfig的构造函数,
LiveIndexWriterConfig(Analyzer analyzer) { this.analyzer = analyzer; ramBufferSizeMB = IndexWriterConfig.DEFAULT_RAM_BUFFER_SIZE_MB; maxBufferedDocs = IndexWriterConfig.DEFAULT_MAX_BUFFERED_DOCS; maxBufferedDeleteTerms = IndexWriterConfig.DEFAULT_MAX_BUFFERED_DELETE_TERMS; mergedSegmentWarmer = null; delPolicy = new KeepOnlyLastCommitDeletionPolicy(); commit = null; useCompoundFile = IndexWriterConfig.DEFAULT_USE_COMPOUND_FILE_SYSTEM; openMode = OpenMode.CREATE_OR_APPEND; similarity = IndexSearcher.getDefaultSimilarity(); mergeScheduler = new ConcurrentMergeScheduler(); indexingChain = DocumentsWriterPerThread.defaultIndexingChain; codec = Codec.getDefault(); infoStream = InfoStream.getDefault(); mergePolicy = new TieredMergePolicy(); flushPolicy = new FlushByRamOrCountsPolicy(); readerPooling = IndexWriterConfig.DEFAULT_READER_POOLING; indexerThreadPool = new DocumentsWriterPerThreadPool(); perThreadHardLimitMB = IndexWriterConfig.DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB; }
LiveIndexWriterConfig构造函数又创建并保存了一系列组件,在后面的代码分析中如果碰到会一一分析,这里就不往下看了。
回到lucene实例中,接下来根据刚刚创建的LiveIndexWriterConfig创建一个IndexWriter,IndexWriter时lucene创建索引最为核心的类,其构造函数比较长,下面一一来看,
public IndexWriter(Directory d, IndexWriterConfig conf) throws IOException { if (d instanceof FSDirectory && ((FSDirectory) d).checkPendingDeletions()) { throw new IllegalArgumentException(); } conf.setIndexWriter(this); config = conf; infoStream = config.getInfoStream(); writeLock = d.obtainLock(WRITE_LOCK_NAME); boolean success = false; try { directoryOrig = d; directory = new LockValidatingDirectoryWrapper(d, writeLock); mergeDirectory = addMergeRateLimiters(directory); analyzer = config.getAnalyzer(); mergeScheduler = config.getMergeScheduler(); mergeScheduler.setInfoStream(infoStream); codec = config.getCodec(); bufferedUpdatesStream = new BufferedUpdatesStream(infoStream); poolReaders = config.getReaderPooling(); OpenMode mode = config.getOpenMode(); boolean create; if (mode == OpenMode.CREATE) { create = true; } else if (mode == OpenMode.APPEND) { create = false; } else { create = !DirectoryReader.indexExists(directory); } boolean initialIndexExists = true; String[] files = directory.listAll(); IndexCommit commit = config.getIndexCommit(); StandardDirectoryReader reader; if (commit == null) { reader = null; } else { reader = commit.getReader(); } if (create) { if (config.getIndexCommit() != null) { if (mode == OpenMode.CREATE) { throw new IllegalArgumentException(); } else { throw new IllegalArgumentException(); } } SegmentInfos sis = null; try { sis = SegmentInfos.readLatestCommit(directory); sis.clear(); } catch (IOException e) { initialIndexExists = false; sis = new SegmentInfos(); } segmentInfos = sis; rollbackSegments = segmentInfos.createBackupSegmentInfos(); changed(); } else if (reader != null) { ... } else { ... } pendingNumDocs.set(segmentInfos.totalMaxDoc()); globalFieldNumberMap = getFieldNumberMap(); config.getFlushPolicy().init(config); docWriter = new DocumentsWriter(this, config, directoryOrig, directory); eventQueue = docWriter.eventQueue(); synchronized(this) { deleter = new IndexFileDeleter(files, directoryOrig, directory, config.getIndexDeletionPolicy(), segmentInfos, infoStream, this, initialIndexExists, reader != null); assert create || filesExist(segmentInfos); } if (deleter.startingCommitDeleted) { changed(); } if (reader != null) { ... } success = true; } finally { if (!success) { IOUtils.closeWhileHandlingException(writeLock); writeLock = null; } } }
IndexWriter构造函数首先通过checkPendingDeletions函数删除被标记的文件,checkPendingDeletions函数定义在FSDirectory中,如下所示
public boolean checkPendingDeletions() throws IOException { deletePendingFiles(); return pendingDeletes.isEmpty() == false; } public synchronized void deletePendingFiles() throws IOException { if (pendingDeletes.isEmpty() == false) { for(String name : new HashSet<>(pendingDeletes)) { privateDeleteFile(name, true); } } } private void privateDeleteFile(String name, boolean isPendingDelete) throws IOException { try { Files.delete(directory.resolve(name)); pendingDeletes.remove(name); } catch (NoSuchFileException | FileNotFoundException e) { } catch (IOException ioe) { } }
checkPendingDeletions函数最后调用Files的delete函数删除保存在pendingDeletes的文件。
回到IndexWriter的构造函数中,接下来通过infoStream获得在LiveIndexWriterConfig构造函数中创建的NoOutput,该infoStream用来显示信息,然后调用FSDirectory的obtainLock函数获得文件的写锁,这里就不往下分析了。
回到IndexWriter的构造函数中,接下来会经过一系列的创建和赋值操作,假设create为true,即表示第一次创建或者重新创建索引,然后会通过SegmentInfos的readLatestCommit函数读取段信息,
public static final SegmentInfos readLatestCommit(Directory directory) throws IOException { return new FindSegmentsFile<SegmentInfos>(directory) { @Override protected SegmentInfos doBody(String segmentFileName) throws IOException { return readCommit(directory, segmentFileName); } }.run(); }
SegmentInfos的readLatestCommit函数创建了一个FindSegmentsFile并调用其run函数,定义如下,
public T run() throws IOException { return run(null); } public T run(IndexCommit commit) throws IOException { long lastGen = -1; long gen = -1; IOException exc = null; for (;;) { lastGen = gen; String files[] = directory.listAll(); String files2[] = directory.listAll(); Arrays.sort(files); Arrays.sort(files2); if (!Arrays.equals(files, files2)) { continue; } gen = getLastCommitGeneration(files); if (gen == -1) { throw new IndexNotFoundException(); } else if (gen > lastGen) { String segmentFileName = IndexFileNames.fileNameFromGeneration(IndexFileNames.SEGMENTS, "", gen); try { T t = doBody(segmentFileName); return t; } catch (IOException err) { } } else { throw exc; } } }
这里的泛型T就是SegmentInfos,run函数首先调用getLastCommitGeneration函数获得gen信息,假设索引文件夹下有一个文件名为segments_6的文件,则getLastCommitGeneration最后会返回6赋值到gen中,接下来,如果gen大于lastGen,就表示段信息有更新了,这时候就要通过doBody函数读取该segments_6文件的信息,并返回一个SegmentInfos。
根据前面readLatestCommit的代码,doBody函数最后会调用readCommit函数,定义在SegmentInfos中,代码如下
public static final SegmentInfos readCommit(Directory directory, String segmentFileName) throws IOException { long generation = generationFromSegmentsFileName(segmentFileName); try (ChecksumIndexInput input = directory.openChecksumInput(segmentFileName, IOContext.READ)) { return readCommit(directory, input, generation); } }
readCommit函数首先创建一个ChecksumIndexInput,然后通过readCommit函数读取段信息并返回一个SegmentInfos,这里的readCommit函数和具体的segments_*文件格式和协议相关,这里就不往下看了。最后返回的SegmentInfos保存了段信息。
回到IndexWriter的构造函数中,如果readLatestCommit函数返回的SegmentInfos不为空,就调用其clear清空,如果是第一次创建索引,就会构造一个SegmentInfos,SegmentInfos的构造函数为空函数。接下来调用SegmentInfos的createBackupSegmentInfos函数备份其中的SegmentCommitInfo信息列表,该备份主要是为了回滚rollback操作使用。IndexWriter然后调用changed表示段信息发生了变化。
继续往下看IndexWriter的构造函数,pendingNumDocs函数记录了索引记录的文档总数,globalFieldNumberMap记录了该段中Field的相关信息,getFlushPolicy返回在LiveIndexWriterConfig构造函数中创建的FlushByRamOrCountsPolicy,然后通过FlushByRamOrCountsPolicy的init函数进行简单的赋值。再往下创建了一个DocumentsWriter,并获得其事件队列保存在eventQueue中。IndexWriter的构造函数接下来会创建一个IndexFileDeleter,IndexFileDeleter用来管理索引文件,例如添加引用计数,在多线程环境下操作索引文件时可以保持同步性。
下一章继续分析lucene创建索引的实例的源代码。
- lucene源码分析---2
- lucene索引源码分析2
- Lucene源码分析-- Analyzer
- lucene 源码分析
- lucene源码分析---1
- lucene源码分析---3
- lucene源码分析---4
- lucene源码分析---5
- lucene源码分析---6
- lucene源码分析---7
- lucene源码分析---8
- lucene源码分析---10
- lucene源码分析---9
- lucene源码分析---11
- lucene源码分析---12
- lucene源码分析---13
- lucene源码分析---15
- lucene的索引源码分析
- navigationbar下面有关于阴影的属性
- 电子商务系统需求分析
- 欢迎使用CSDN-markdown编辑器
- Go中的用组合实现继承
- centos 杀死进程命令
- lucene源码分析---2
- 在FPGA开发中尽量避免全局复位的使用
- openstack cinder-volume 的高可用(HA)
- 利用 Cordova+Famous 创建高性能跨平台APP
- 按比例缩放图片,JavaScript代码
- hibernate 连接sqlserver 2008
- Get,Post和Request
- POJ 3436 ACM Computer Factory(最大流)
- 提高项目30.3-删除特定字符