Lucene三(索引加权)
来源:互联网 发布:国家广电网络宽带 编辑:程序博客网 时间:2024/05/17 07:36
在建立索引的时候,为指定的Document对象加权会增加该文档的评分,使其在搜索结果中靠前。使用Document对象的setBoost方法可以为索引加权,代码如下:
先来测试一下不加权时索引的搜索结果,需要一个search方法:
public void search() {
try {
IndexReader reader = IndexReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
TermQuery query = new TermQuery(new Term("content","like"));
TopDocs tds = searcher.search(query, 10);
for(ScoreDoc sd:tds.scoreDocs) {
Document doc = searcher.doc(sd.doc);
System.out.println("("+sd.doc+")"+doc.get("name")+"["+doc.get("email")+"]-->"+doc.get("id"));
}
reader.close();
} catch (CorruptIndexException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
单元测试:
@Test
public void testSearch(){
IndexUtil iu = new IndexUtil();
iu.search();
}
测试输出结果:
(4)mike[ee@zttc.edu]-->5
(3)jetty[dd@sina.org]-->4
(5)jake[ff@itat.org]-->6
(0)zhangsan[aa@itat.org]-->1
(1)lisi[bb@itat.org]-->2
(2)john[cc@cc.org]-->3
修改添加索引方法Index,为指定索引加权,这里修改前先使用了一个map来存储权值信息,红色部分为加权的代码:
public class IndexUtil {
private String[] ids = {"1","2","3","4","5","6"};
private String[] emails = {"aa@itat.org","bb@itat.org","cc@cc.org","dd@sina.org","ee@zttc.edu","ff@itat.org"};
private String[] contents = {
"welcome to visited the space,I like book",
"hello boy, I like pingpeng ball",
"my name is cc I like game",
"I like football",
"I like football and I like basketball too",
"I like movie and swim"
};
private Date[] dates = null;
private int[] attachs = {2,3,1,4,5,5};
private String[] names = {"zhangsan","lisi","john","jetty","mike","jake"};
//用于存放加权信息
private Map<String,Float> scores = new HashMap<String,Float>();
private Directory directory = null;
public IndexUtil() {
try {
scores.put("itat.org",2.0f);
scores.put("zttc.edu", 1.5f);
directory = FSDirectory.open(new File("F:\\stady\\JAVA\\other\\Lucene\\test\\index02"));
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* 建立索引
*/
public void index(){
IndexWriter writer = null;
try {
writer = new IndexWriter(directory, new IndexWriterConfig(Version.LUCENE_35, new StandardAnalyzer(Version.LUCENE_35)));
//创建前先删除索引
writer.deleteAll();
Document doc = null;
for(int i = 0; i < ids.length; i++){
doc = new Document();
doc.add(new Field("id", ids[i], Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS));
doc.add(new Field("email",emails[i],Field.Store.YES,Field.Index.NOT_ANALYZED));
doc.add(new Field("content",contents[i],Field.Store.NO,Field.Index.ANALYZED));
doc.add(new Field("name",names[i],Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS));
String et = emails[i].substring(emails[i].lastIndexOf("@")+1);
//加权操作。默认为1.0f
if(scores.containsKey(et)) {
doc.setBoost(scores.get(et));
} else {
doc.setBoost(0.5f);
}
writer.addDocument(doc);
}
} catch (CorruptIndexException e) {
e.printStackTrace();
} catch (LockObtainFailedException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally{
try {
if(writer != null) writer.close();
} catch (CorruptIndexException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
再次运行单元测试方法testSearch,运行结果为:
(5)jake[ff@itat.org]-->6
(0)zhangsan[aa@itat.org]-->1
(1)lisi[bb@itat.org]-->2
(4)mike[ee@zttc.edu]-->5
(3)jetty[dd@sina.org]-->4
(2)john[cc@cc.org]-->3
明显看到此时加权成功!
- Lucene三(索引加权)
- lucene全文搜索之三:生成索引字段,创建索引文档(给索引字段加权)基于lucene5.5.3
- Lucene小练七--(索引删除 清空 加权 排序)复习
- Lucene学习笔记之(三)文档加权
- Lucene(三)索引域选项
- Lucene学习(三):综述Lucene的索引文件格式
- lucene-索引文档的删除、更新及增强加权
- Lucene索引删除、更新、恢复和加权操作
- lucene索引_加权操作、对日期和数字进行索引、IndexReader的设计
- lucene-查询表达式加权
- Lucene 加权的值
- 边学边记(六) lucene索引结构三(_N.fnm)
- 全文索引(三)lucene 分词器 Analyzer
- Lucene 实例教程(三)之操作索引
- Lucene教程(三) 索引域选项
- Lucene教程(三) 索引域选项
- Lucene 实例教程(三)之操作索引
- Lucene 6教程(三) 索引域选项
- 利用Redis BitMap 统计用户活跃指标
- ip地址、子网掩码、DNS的关系与区别
- ajax返回数据的遍历方式
- 与善淘网一起做慈善商店
- 陈力:传智播客古代 珍宝币 泡泡龙游戏开发第29讲:PHP排序和查找
- Lucene三(索引加权)
- java动态代理(JDK和cglib)
- redis内存泄露问题
- redis 学习指南
- 实现简单的MVC模式,通过一个小例子,不是很完美,但是可以说明一部分的问题
- 黑马程序员----反射技术
- windows下C的定时器timeSetEvent使用
- 当 IDENTITY_INSERT 设置为 OFF 时,不能为表中的标识列插入显式值
- jquery语句开头的#和.分别是什么意思