Indexing

来源:互联网 发布:免费文字识别软件 编辑:程序博客网 时间:2024/05/16 10:15

 


Adding documents to an index
1.KeyWord, Unindexed, UnStored, Text
2.Heterogeneous Documents
3.Appendable Fields

 

Removing Documents from an index
1.delete()
2.hasDeletions()
3.isDeleted()

Undeleting Documents
1.undeleteAll()

Updating Documents in an index
1.update
2.updating by bataching deletions

Boosting Documents and Fields
1.setBoost()
2.Document and Field

Indexing Dates
1.Keyword()
2.DateField

Indexing numbers
1.WhitespaceAnalyzer   StandardAnalyzer
2.SimpleAnalyzer   StopAnalyzer

Indexing Fields used for sorting
1.Keyword
2.Integers, Floats and Strings

Indexing tuning
1.mergeFactor
2.maxMergeDocs
3.minMergeDocs


in-memory indexing:RAMDirctory

Batch indexing by using RAMDirectory as a buffer
1.Create an FSDirectory index
2.Create a RAMDirectory index
3.Add Documents to RAMDirectory index
4.Every so oftem, Flush everything buffered in RAMDirectory into FSDirectory
5.Go to step 3.


Parallelizing indexing by working with mutiple indexes

Limiting Field sizes:maxFieldLength

Optimizing an index only affects the speed of searches against that index, and doesn’t

affect the speed of indexing. Optimizing by minimizing the number of index files that need

to be opened


Concurreny rules
1.Any number of read-only operations may be executed concurrently.
2.Any number of read-only operations may be executed while an index is
being modified.
3.Only a single index-modifying operation may execute at a time.

Index locking
1.org,apache.lucene.lockDir
2.reader.isLocked()
3.reader.unlock()

debugging indexing
writer.infoStream = System.out