Lucene实战（一）Lucene介绍及HelloWorld运行（附Eclipse工程）

来源：互联网发布：学编程需要什么学历编辑：程序博客网时间：2024/06/05 00:48

前言

给你一张过去的CD，听听我们有过的思绪~~~~~

Lucene简介

Lucene是一个开源的、成熟的java检索库。它为许多文档（Document）维护了一个倒排索引表，并且向外表现出了简单易用的API。更多有关Lucene的介绍可以参看Lucene的百科。

下图展现了Lucene的索引处理和检索处理的流程（点击图片放大）：

下面的表格描述了Lucene中各包的作用。

^包^名

^功^能

^{org.apache.lucene.analysis}

^{语言分析器，主要用于切词，中文切词可以扩展此类}

^{org.apache.lucene.document}

^{索引存储时的文档结构管理，类似于关系型数据库的表结构}

^{org.apache.lucene.index}

^{索引管理，包括索引建立、删除等}

^{org.apache.lucene.queryParser}

^{查询分析器，实现查询关键词的运算，如与、或、非等}

^{org.apache.lucene.search}

^{检索管理，根据查询条件，检索得到结果}

^{org.apache.lucene.store}

^{数据存储管理，主要包括一些底层的}^I/O^操作

^{org.apache.lucene.util}

^{一些公用类}

Lucene入门应用

上图中，红色部分是我们需要利用Lucene的API来进行干涉的，不过这些都非常容易。下面是利用Lucene实现全文检索功能的一般步骤（未整合任何框架）：

创建索引

package org.xiaom.lucene;import java.io.BufferedReader;public class MyIndexCreater {private static IndexWriter indexWriter;private static Version version = Version.LUCENE_35;/** * 为该目录<strong>及其子目录</strong>下所有的文本文件（.java;.xml;.txt)创建索引 * @param docPath 文档存放路径 * @param indexPath　索引存放路径 */public static void createContainChild(String docPath, String indexPath)throws IOException {File docDir = new File(docPath);File indexDir = new File(indexPath);//1,打开索引的存放目录Directory directory = FSDirectory.open(indexDir);//2,创建IndexWriterConfigIndexWriterConfig conf = new IndexWriterConfig(version,new StandardAnalyzer(version));//每次都覆盖之前的索引文件conf.setOpenMode(OpenMode.CREATE);//根据IndexWriterConfig实例创建IndexWriterindexWriter = new IndexWriter(directory, conf);indexDir(docDir);//7,提交，关闭indexWrtier(必须)indexWriter.commit();indexWriter.close();}// 该目录及其子目录创建索引，返回索引文件总数private static int indexDir(File dir) {int c = 0;File[] files = dir.listFiles();for (File f : files) {if (f.isDirectory()) {indexDir(f);} else if (f.getName().endsWith(".java")|| f.getName().endsWith(".txt")|| f.getName().endsWith(".xml")) {c += indexFile(f);}}return c;}//为某个文件创建索引,索引成功返回1,失败0private static int indexFile(File f) {boolean rs = true;BufferedReader br = null;String titleStr = null;StringBuffer contentStr = new StringBuffer();try {br = new BufferedReader(new FileReader(f));titleStr = br.readLine();String s;while((s=br.readLine())!=null){contentStr.append(s);contentStr.append("\n");}//3,创建Document对象Document doc = new Document();//4,创建Field对象Field name = new Field("name", f.getName(), Store.YES, Index.ANALYZED);Field title = new Field("title", titleStr, Store.YES, Index.ANALYZED);Field content = new Field("content", contentStr.toString(), Store.YES,Index.ANALYZED);//5,将Field对象加入到Documentdoc.add(name);doc.add(title);doc.add(content);//6,将Document加入到indexWriterindexWriter.addDocument(doc);} catch (Exception e) {e.printStackTrace();rs = false;}return rs ? 1 : 0;}}

搜索

package org.xiaom.lucene;import java.io.File;public class MyIndexSearcher {private static Version version=Version.LUCENE_35;/** * @param indexPath 索引存放路径 * @param key 搜索关键字 * @param value 关键字的值 */public static void search(String indexPath, String key, String value) {IndexReader ireader = null;try {//1,创建IndexReaderireader = IndexReader.open(FSDirectory.open(new File(indexPath)));//2,根据indexReader实例创建IndexSearcherIndexSearcher indexSearcher = new IndexSearcher(ireader);//3,创建QueryParserQueryParser queryParser =new QueryParser(version,key,new StandardAnalyzer(version));//4,通过queryParser解析出QueryQuery query=queryParser.parse(value);//5,使用TopDocs接收indexSearcher.searche的返回值TopDocs topDocs=indexSearcher.search(query,100);ScoreDoc[] scoreDocs=topDocs.scoreDocs;//6,获取Document输出System.err.println("total hit:"+topDocs.totalHits);System.out.println("total document:"+scoreDocs.length);System.err.println("==================================================");for(int i=0;i<scoreDocs.length;i++){Document doc=indexSearcher.doc(scoreDocs[i].doc);System.out.println("name:"+doc.get("name"));System.err.println("title:"+doc.get("title"));System.out.println("score:"+scoreDocs[i].score);System.err.println("content:"+doc.get("content").substring(0, 80));}} catch (CorruptIndexException e) {e.printStackTrace();} catch (IOException e) {e.printStackTrace();} catch (ParseException e) {e.printStackTrace();}}}

测试检索

package org.xiaom.lucene;import java.io.IOException;public class LuceneTest {public static void main(String[] args) throws IOException {String docPath="D:/test1/docs";String indexPath="D:/test1/index";MyIndexCreater.createContainChild(docPath, indexPath);MyIndexSearcher.search(indexPath, "content", "adfddd");}}

这里是一个Lucene3.5入门实例下载

维护索引

维护索引一般有如下几种操作
增加索引(见上文)
删除索引
//删除某些满足条件的索引及Documentpublic boolean delete(Term term){boolean rs=true;try {indexWriter.deleteDocuments(term);} catch (CorruptIndexException e) {e.printStackTrace();rs=false;} catch (IOException e) {rs=false;e.printStackTrace();}return rs;}
更新（删除索引后新增）索引
public boolean update(Document doc){boolean rs=true;try {indexWriter.addDocument(doc);} catch (CorruptIndexException e) {rs=false;e.printStackTrace();} catch (IOException e) {rs=false;e.printStackTrace();}return rs;}
合并索引文件
public void addIndexes(Directory... dirs)将dirs中索引合并到IndexWriter中，等待commit。

0 0