Lucene(1)---入门示例

来源：互联网发布：中兴通讯数据分析编辑：程序博客网时间：2024/06/05 05:21

1、什么是Lucene?

Lucene是一个基于java的全文信息检索工具包，为应用程序提供索引和搜索功能。

2、为什么需要使用Lucene?

如果没有使用Lucene,需要根据关键字搜索数据库表中的记录，就需要使用like一个字符一个字符

的去匹配，这样子查询对数据库的性能开销十分大。

2）、当数据量非常大时，会造成系统中查询效率低下，使用lucene可以减少数据库性能开销，提升查询效率。

3、Lucene中常用概念

1）、Docuement：用于对需要进行索引的文档进行描述

2）、Field：用于对document进行描述

3）、Analyzer：对需要进行索引的document来说，需要按照一定的规则将内容进行切分，这样才能被索引

analyzer就是用于切词

4）、indexWriter：它是document和索引之间的桥梁，用于将document加载到索引中

5）、Directory：用于描述索引的存放位置

6）、Query：用于将需要查询的内容封装成索引能够理解的内容

7）、IndexSearch：用来根据Query在索引中检索复合条件的内容

4、Lucene的执行流程

Lucene的操作方式和数据库有点类似，索引使用Lucene就要先创建数据库，然后往这个数据表中一行

一行的插入数据，数据插入成功之后，就可以操作这张数据表，实现CRUD操作。

1）、创建索引文件目录，然后将需要检索的信息用Field对应匹配的封装成一个Doocument文档对象

将这个对象放入索引文件目录中，这里既可以将索引放入内存中，也可以放入磁盘中

2）、如果发现信息有问题需要删除，那么索引文件也要删除。否则检索的时候还是会查询得到，这个

时候需要根据索引id去删除对应的索引

3）、如果发现信息被更新，那么索引文件也要更新，这个时候需要先将旧的索引删除然后添加新的索引。

4）、全文检索和查询数据库一样，先需要创建索引读取对象，然后封装Query查询对象，

调用search()方法得到检索结果

5、使用Lucene将索引写入内存中

1）、创建内存目录对象RAMDirectory

Directory directory=new RAMDirectory();

 DirectoryReader reader=DirectoryReader.open(directory);

2）、创建索引写入器IndexWriter

 IndexWriterConfig writerConfig=new IndexWriterConfig(luceneVersion, new StandardAnalyzer()); IndexWriter writer = new IndexWriter(directory, writerConfig);  //将

3）、创建Document文档对象

在lucene中创建的索引可以看做是数据库中的一张表，表中可以有字段

往表里面添加内容可以根据字段去匹配查询

Document document=new Document();

往document文档对象添加内容

document.add(new StringField("name","Tom",Field.Store.YES));

4）、用索引写入器将指定的数据存入内存目录对象中

indexWriter.addDocument(document);

5）、创建索引查询对象IndexSearcher，里面传递的是写入的内存目录对象

创建DirectoryReader：

 DirectoryReader reader=DirectoryReader.open(directory);

 IndexSearcher searcher = new IndexSearcher(reader);

6）、将关键字封装成Query查询对象

 Query query = new TermQuery(new Term("name", "Chenghui"));

7）、将查询的结果返回个TopDocs，然后遍历里面索引的Document对象，显示查询结果

  TopDocs rs = searcher.search(query, null, 10);

 for (int i = 0; i < rs.scoreDocs.length; i++) {        // rs.scoreDocs[i].doc 是获取索引中的标志位id, 从0开始记录       Document firstHit = searcher.doc(rs.scoreDocs[i].doc);       System.out.println("name:" + firstHit.getField("name").stringValue());       System.out.println("sex:" + firstHit.getField("sex").stringValue());              }

8）、关闭IndexWriter写入器、关闭RAMDirectory目录对象

indexWriter.close();

ramDirectory.close();

将索引写入内存中，并查询索引示例代码如下：

package com.cn.test;import java.io.IOException;import org.apache.lucene.analysis.Analyzer;import org.apache.lucene.analysis.standard.StandardAnalyzer;import org.apache.lucene.document.Document;import org.apache.lucene.document.Field;import org.apache.lucene.document.StringField;import org.apache.lucene.index.DirectoryReader;import org.apache.lucene.index.IndexWriter;import org.apache.lucene.index.IndexWriterConfig;import org.apache.lucene.index.Term;import org.apache.lucene.search.IndexSearcher;import org.apache.lucene.search.Query;import org.apache.lucene.search.TermQuery;import org.apache.lucene.search.TopDocs;import org.apache.lucene.store.RAMDirectory;import org.apache.lucene.util.Version;/** * 将索引写入内存中，并查询内存中索引 * */public class Test {//当前使用lucene版本private static Version luceneVersion=Version.LUCENE_4_10_1;private RAMDirectory ramDirectory=null;private static TopDocs topDocs=null;/** * 创建索引 * @throws IOException  * */public void createIndex() throws IOException{//1、创建内存目录对象,内存目录对象用于存放索引的位置ramDirectory=new RAMDirectory();//2.创建索引写入器IndexWriter,用于将document加载到索引中//2.1、创建切词对象,对于需要进行索引的document来说,需要按照一定的规则将内容进行切分Analyzer analyzer=new StandardAnalyzer();IndexWriterConfig writerConfig=new IndexWriterConfig(luceneVersion,analyzer);IndexWriter indexWriter=new IndexWriter(ramDirectory, writerConfig);//3.创建document对象,用于对需要进行索引的文档进行描述Document document=new Document();/**       *4.下面为document对象添加三个字段      *  document.add(field),field是需要被索引的字段      *  StringField将字符串当做一个整体,不可被切分      *  TextField中的字符串可以被切分      *  new StringField(name,value,stored)      *  name:字段名称      *  value:字段的值      *  Field.Store.YES:将字段值进行存储(存储的是未分词的字段值)      *  Field.Store.NO:不存储字段值(存储与索引没有关系) * */document.add(new StringField("username", "Tom", Field.Store.YES));document.add(new StringField("message","this is a program!",Field.Store.YES));document.add(new StringField("doSomthing", "I am learning lucene!", Field.Store.YES));//5、用索引写入器将需要索引的文档对象加入indexWriter.addDocument(document);//6.将索引写入器关闭indexWriter.close();}/** * 查询索引 * @throws IOException  * */public void searchIndex() throws IOException{//1.创建DirectoryReader流,用于读取directoryDirectoryReader directoryReader=DirectoryReader.open(ramDirectory);//2.创建IndexSearcher检索索引的对象IndexSearcher indexSearcher=new IndexSearcher(directoryReader);//3.创建Query查询对象,根据需要搜索的关键字封装成一个Term组合对象Query query=new TermQuery(new Term("message","this is a program!"));//4.去索引目录中查询,返回的是TopDocs对象,里面存放的就是Document文档对象topDocs=indexSearcher.search(query, null, 10);for(int i=0;i<topDocs.scoreDocs.length;i++){Document doc=indexSearcher.doc(topDocs.scoreDocs[i].doc);System.out.println("message:"+doc.getField("message").stringValue());}}//测试public static void main(String[] args) throws IOException {//1.创建索引Test test=new Test();long startTime=System.currentTimeMillis();System.out.println("********检索时间开始********");test.createIndex();test.searchIndex();long endTime=System.currentTimeMillis();System.out.println("总共花费"+(endTime-startTime)+"毫秒,检索到"+topDocs.totalHits+"条记录");}}

阅读全文

0 0