程序博客网 > 申请淘宝客服介入处理

lucene 的一些整理

来源：互联网发布：申请淘宝客服介入处理编辑：程序博客网时间：2024/04/29 12:19

lucene3.0学习笔记3-(IndexWriter的一些方法和属性)

文章分类:互联网

2.3 Basic index operations
2.3.1 Adding documents to an index
IndexWriter有两个方法可以加入Document的方法

复制代码

addDocument(Document)和addDocument(Document, Analyzer)

addDocument(Document)和addDocument(Document, Analyzer)

第一个是加Document使用默认的分词器，第二个是加入的时候使用指定的分词器

2.3.2 Deleting documents from an index
IndexWriter提供四个方法删除Document

复制代码

deleteDocuments(Term);
deleteDocuments(Term[]);
deleteDocuments(Query);
deleteDocuments(Query[]);

deleteDocuments(Term);deleteDocuments(Term[]);deleteDocuments(Query);deleteDocuments(Query[]);

一般最好有个唯一索引，这样才好删,不然的话有可以会一删一大堆
如：

复制代码

writer.deleteDocument(new Term(“ID”, documentID));

writer.deleteDocument(new Term(“ID”, documentID));

复制代码

package com.langhua;
import java.io.File;
import java.io.IOException;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.Version;
/**
* Lucene 3.0+ 删除索引
* @author Administrator
*
*/
public class DeleteIndex {
public static void main(String[] args) throws CorruptIndexException, IOException {
//索引所放目录
String indexDir = "F://indexDir";
//创建Directory
Directory dir = new SimpleFSDirectory(new File(indexDir));
IndexWriter indexWriter = new IndexWriter(dir,new StandardAnalyzer(Version.LUCENE_30),false,IndexWriter.MaxFieldLength.UNLIMITED);
//删除filename为time.txt的Document
indexWriter.deleteDocuments(new Term("filename","time.txt"));
//优化
indexWriter.optimize();
//提交事务
indexWriter.commit();
System.out.println("是否有删除="+indexWriter.hasDeletions());
//如果不indexWriter.optimize()以下两个会有区别
System.out.println("一共有"+indexWriter.maxDoc()+"索引");
System.out.println("还剩"+indexWriter.numDocs()+"索引");
indexWriter.close();
}
}

package com.langhua;import java.io.File;import java.io.IOException;import org.apache.lucene.analysis.standard.StandardAnalyzer;import org.apache.lucene.index.CorruptIndexException;import org.apache.lucene.index.IndexWriter;import org.apache.lucene.index.Term;import org.apache.lucene.store.Directory;import org.apache.lucene.store.SimpleFSDirectory;import org.apache.lucene.util.Version;/** * Lucene 3.0+ 删除索引 * @author Administrator * */public class DeleteIndex {public static void main(String[] args) throws CorruptIndexException, IOException {//索引所放目录String indexDir = "F://indexDir";//创建DirectoryDirectory dir =  new SimpleFSDirectory(new File(indexDir));IndexWriter indexWriter = new IndexWriter(dir,new StandardAnalyzer(Version.LUCENE_30),false,IndexWriter.MaxFieldLength.UNLIMITED);//删除filename为time.txt的DocumentindexWriter.deleteDocuments(new Term("filename","time.txt"));//优化indexWriter.optimize();//提交事务indexWriter.commit();System.out.println("是否有删除="+indexWriter.hasDeletions());//如果不indexWriter.optimize()以下两个会有区别System.out.println("一共有"+indexWriter.maxDoc()+"索引");System.out.println("还剩"+indexWriter.numDocs()+"索引");indexWriter.close();}}

2.3.3 Updating documents in the index
更新索引也提供两个方法，其实Lucene是没有办法更新的，只有先删除了再更新，方法如下

复制代码

updateDocument(Term, Document)
//first deletes all documents containing the provided term and then adds the new document using the writer’s default analyzer.
updateDocument(Term, Document, Analyzer)
//does the same, but uses the provided analyzer instead of the writer’s default analyzer.

updateDocument(Term, Document) //first deletes all documents containing the provided term and then adds the new document using the writer’s default analyzer.updateDocument(Term, Document, Analyzer) //does the same, but uses the provided analyzer instead of the writer’s default analyzer.

可以这样使用

复制代码

writer.updateDocument(new Term(“ID”, documenteId), newDocument);

writer.updateDocument(new Term(“ID”, documenteId), newDocument);

复制代码

package com.langhua;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.Version;
/**
* Lucene 3.0 更新索引
*
* @author Administrator
*
*/
public class updateIndex {
public static void main(String[] args) throws IOException {
String indexDir = "F://indexDir";
String dateDir = "F://dateDir";
Directory dir = new SimpleFSDirectory(new File(indexDir));
IndexWriter indexWriter = new IndexWriter(dir, new StandardAnalyzer(
Version.LUCENE_30), false, IndexWriter.MaxFieldLength.UNLIMITED);
File[] files = new File(dateDir).listFiles();
Document doc = new Document();
for (int i = 0; i < files.length; i++) {
if (files[i].getName().equals("time.txt")) {
doc.add(new Field("contents", new FileReader(files[i])));
doc.add(new Field("filename", files[i].getName(),
Field.Store.YES, Field.Index.NOT_ANALYZED));
}
}
// 更新索引使用默认分词器
indexWriter.updateDocument(new Term("filename", "time.txt"), doc);
indexWriter.close();
}
}

package com.langhua;import java.io.File;import java.io.FileReader;import java.io.IOException;import org.apache.lucene.analysis.standard.StandardAnalyzer;import org.apache.lucene.document.Document;import org.apache.lucene.document.Field;import org.apache.lucene.index.IndexWriter;import org.apache.lucene.index.Term;import org.apache.lucene.store.Directory;import org.apache.lucene.store.SimpleFSDirectory;import org.apache.lucene.util.Version;/** * Lucene 3.0 更新索引 *  * @author Administrator *  */public class updateIndex {public static void main(String[] args) throws IOException {String indexDir = "F://indexDir";String dateDir = "F://dateDir";Directory dir = new SimpleFSDirectory(new File(indexDir));IndexWriter indexWriter = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), false, IndexWriter.MaxFieldLength.UNLIMITED);File[] files = new File(dateDir).listFiles();Document doc = new Document();for (int i = 0; i < files.length; i++) {if (files[i].getName().equals("time.txt")) {doc.add(new Field("contents", new FileReader(files[i])));doc.add(new Field("filename", files[i].getName(),Field.Store.YES, Field.Index.NOT_ANALYZED));}}// 更新索引使用默认分词器indexWriter.updateDocument(new Term("filename", "time.txt"), doc);indexWriter.close();}}

2.4 Field options
2.4.1 Field options for indexing

在创建Field的时候一般常用的要指定两个参数
Field.Index.*
需要分词，并建立索引
Index.ANALYZED – use the analyzer to break the Field’s value into a stream of separate tokens
and make each token searchable. This is useful for normal text fields (body, title, abstract, etc.).

不分词，直接建立索引
Index.NOT_ANALYZED – do index the field, but do not analyze the String.

不用建立索引
Index.NO – don’t make this field’s value available for searching at all.

后面两个没有看懂。。
Index.ANALYZED_NO_NORMS
Index.NOT_ANALYZED_NO_NORMS

2.4.2 Field options for storing fields
要指定的另一个参数是:Field.Store.*

要保存在Document里面
Store.YES — store the value.

不要保存到Document里面,一般用于建立索引
Store.NO – do not store the value.

2.4.3 Field options for term vectors
在建立Field还有一个不常用的参数TermVector
http://callan.javaeye.com/blog/155602参考一下
因为我也没有怎么看懂，书上说在后面有高亮作用。。。后面书上应该会介绍的

2.4.4 Other Field values
其它的创建Field方法

复制代码

//uses a Reader instead of a String to represent the value. In this
//case the value cannot be stored (hardwired to Store.NO)
//and is always analyzed and indexed (Index.ANALYZED).
//这个方法是用来分词的，不能保存
Field(String name, Reader value, TermVector vector)
//这个不懂
Field(String name, TokenStream tokenStream, TermVector TermVector)
//这个是图片的吧，只能保存，不能建立索引
//never indexed /no term vectors /must be Store.YES
Field(String name, byte[] value, Store store)

//uses a Reader instead of a String to represent the value. In this //case the value cannot be stored (hardwired to Store.NO)//and is always analyzed and indexed (Index.ANALYZED).//这个方法是用来分词的，不能保存Field(String name, Reader value, TermVector vector)//这个不懂Field(String name, TokenStream tokenStream, TermVector TermVector)//这个是图片的吧，只能保存，不能建立索引//never indexed /no term vectors  /must be Store.YESField(String name, byte[] value, Store store)

2.4.5 Field option combinations
最后给出了一张图,说明在什么情况下用什么

点击查看原始大小图片

2.5 Multi-valued Fields
创建多个值，例子:

复制代码

Document doc = new Document();
for (int i = 0; i < authors.length; i++) {
//多次加入同一个author
doc.add(new Field("author", authors[i],
Field.Store.YES,
Field.Index.ANALYZED));
}

Document doc = new Document();for (int i = 0; i < authors.length; i++) {    //多次加入同一个author    doc.add(new Field("author", authors[i],    Field.Store.YES,    Field.Index.ANALYZED));}

2.6 Boosting Documents and Fields
设置Boosting值，Boosting值在0.1到1.5之间，越大就越排在前面（或者是说越重要，就先搜到他），如果不设的话就没有
如:

复制代码

Document doc = new Document();
doc..setBoost(0.1F); OR .setBoost(1.5);
Field senderNameField = new Field("senderName", senderName,
Field.Store.YES,
Field.Index.ANALYZED);
Field subjectField = new Field("subject", subject,
Field.Store.YES,
Field.Index.ANALYZED);
subjectField.setBoost(1.2F);

Document doc = new Document();doc..setBoost(0.1F); OR .setBoost(1.5);Field senderNameField = new Field("senderName", senderName,Field.Store.YES,Field.Index.ANALYZED);Field subjectField = new Field("subject", subject,Field.Store.YES,Field.Index.ANALYZED);subjectField.setBoost(1.2F);

2.6.1 Norms
不明白，没看懂

2.7 Indexing dates & times
Lucene提供了一个工具类DateTools
可以这样来保存时间

复制代码

Document doc = new Document();
doc.add(new Field("indexDate",
DateTools.dateToString(new Date(), DateTools.Resolution.DAY),
Field.Store.YES,
Field.Index.NOT_ANALYZED);

Document doc = new Document();doc.add(new Field("indexDate",DateTools.dateToString(new Date(), DateTools.Resolution.DAY),Field.Store.YES,Field.Index.NOT_ANALYZED);

2.11 Optimizing an index
WriteIndex的优化索引方法

复制代码

//优化索引，使多个Segments变成一个Segments
optimize()
//指定最大Segments的数量
optimize(int maxNumSegments)
//前面的方面都是优化完成之后再返回，这个方法的参数如果是FALSE的话，就直接返回，再开一个线程来优化
optimize(boolean doWait)
//前面两个参数的组合哈
optimize(int maxNumSegments, boolean doWait)

//优化索引，使多个Segments变成一个Segmentsoptimize()//指定最大Segments的数量optimize(int maxNumSegments)//前面的方面都是优化完成之后再返回，这个方法的参数如果是FALSE的话，就直接返回，再开一个线程来优化optimize(boolean doWait)//前面两个参数的组合哈optimize(int maxNumSegments, boolean doWait)

我对Segments还是不太明白，能不能有人出来解释一下啥

2.12 Other Directory Implementations
截了一张图

申请淘宝客服介入处理

申请淘宝客服介入处理

原创粉丝点击

热门问题 老师的惩罚人脸识别我在镇武司摸鱼那些年重生之率土为王我在大康的咸鱼生活盘龙之生命进化天生仙种凡人之先天五行春回大明朝姑娘不必设防，我是瞎子郑州书店书店图片深圳书店怎样开书店网络书店淘宝开书店特价书店书店设计打折书店 24小时书店重庆书店莱罗书店易学书店书店名字中华书店北京书店广东书店人天书店书店怎么开线上书店岛上的书店咖啡书店孔夫子书店科技书店在线书店文艺书店成才书店成品书店书店网上书店中国书店独立书店经济书店书店提醒熊孩子遭家长谩骂书店日记下载新华书店营业时间新华书店网上商城方所书店旅游苏州诚品书店西安网红书店南京先锋书店西西弗书店靠什么盈利